Share

Search Java Code Snippets


  Help us in improving the repository. Add new snippets through 'Submit Code Snippet ' link.





#Java - Code Snippets for '#Crawler4j' - 2 code snippet(s) found

 Sample 1. Web Crawler using crawler4j - Crawler Controller

String crawlStorageFolder = "/data/crawl/t2";

// Set No of Crawler Threads
int numberOfCrawlers = 5;

// Set Config

CrawlConfig config = new CrawlConfig();
config.setCrawlStorageFolder(crawlStorageFolder);

config.setMaxDepthOfCrawling(1);
config.setMaxPagesToFetch(-1);
config.setUserAgentString("JavaIndex");

// Instantiate the controller for this crawl.

PageFetcher pageFetcher = new PageFetcher(config);
RobotstxtConfig robotstxtConfig = new RobotstxtConfig();
RobotstxtServer robotstxtServer = new RobotstxtServer(robotstxtConfig, pageFetcher);
CrawlController controller = new CrawlController(config, pageFetcher, robotstxtServer);

controller.addSeed("http://www.buggybread.com");

// Initiate Crawler threads
controller.start(MyCrawler.class, numberOfCrawlers);

// Exit
System.exit(0);

   Like      Feedback     crawler4j


 Sample 2. Method to get Text / html info from edu.uci.ics.crawler4j.crawler.Page

private void crawlPageInfo(Page page){
try {
String url = page.getWebURL().getURL();

if (page.getParseData() instanceof HtmlParseData) {
HtmlParseData htmlParseData = (HtmlParseData) page.getParseData();
TextParser parser = new TextParser(htmlParseData.getText());
}
} catch(Exception ex){
System.out.println(ex.getMessage());
}

   Like      Feedback     html info   getting text/html from edu.uci.ics.crawler4j.crawler.Page   HtmlParseData   TextParser



Subscribe to Java News and Posts. Get latest updates and posts on Java from Buggybread.com
Enter your email address:
Delivered by FeedBurner



comments powered by Disqus