Crawler4j

Open source web crawler for Java
Download

Crawler4j Ranking & Summary

Advertisement

  • Rating:
  • License:
  • Apache
  • Price:
  • FREE
  • Publisher Name:
  • Yasser Ganjisaffar
  • Publisher web site:
  • http://www.ics.uci.edu/~yganjisa/
  • Operating Systems:
  • Mac OS X
  • File Size:
  • 29 KB

Crawler4j Tags


Crawler4j Description

Open source web crawler for Java Crawler4j is an open source and free Java crawler which provides a simple interface for crawling the web. Using Crawler4j, you can setup a multi-threaded web crawler in 5 minutes!Crawler4j is designed very efficiently and has the ability to crawl domains very fast (e.g., it has been able to crawl 200 Wikipedia pages per second). However, since this is against crawling policies and puts huge load on servers (and they might block you!), since version 1.3, by default crawler4j waits at least 200 milliseconds between requests. This parameter can be tuned with the "setPolitenessDelay" function in controller. Detailed usage instructions for the Crawler4j web crawler are available here.


Crawler4j Related Software