Search Java Code Snippets

  Help us in improving the repository. Add new snippets through 'Submit Code Snippet ' link.

#Java - Code Snippets for '#Crawler4j' - 2 code snippet(s) found

 Sample 1. Web Crawler using crawler4j - Crawler Controller

String crawlStorageFolder = "/data/crawl/t2";

// Set No of Crawler Threads
int numberOfCrawlers = 5;

// Set Config

CrawlConfig config = new CrawlConfig();


// Instantiate the controller for this crawl.

PageFetcher pageFetcher = new PageFetcher(config);
RobotstxtConfig robotstxtConfig = new RobotstxtConfig();
RobotstxtServer robotstxtServer = new RobotstxtServer(robotstxtConfig, pageFetcher);
CrawlController controller = new CrawlController(config, pageFetcher, robotstxtServer);


// Initiate Crawler threads
controller.start(MyCrawler.class, numberOfCrawlers);

// Exit

   Like      Feedback     crawler4j

 Sample 2. Method to get Text / html info from edu.uci.ics.crawler4j.crawler.Page

private void crawlPageInfo(Page page){
try {
String url = page.getWebURL().getURL();

if (page.getParseData() instanceof HtmlParseData) {
HtmlParseData htmlParseData = (HtmlParseData) page.getParseData();
TextParser parser = new TextParser(htmlParseData.getText());
} catch(Exception ex){

   Like      Feedback     html info   getting text/html from edu.uci.ics.crawler4j.crawler.Page   HtmlParseData   TextParser

Subscribe to Java News and Posts. Get latest updates and posts on Java from
Enter your email address:
Delivered by FeedBurner