Scrapes

A framework for crawling and scraping multi-page web sites
Download

Scrapes Ranking & Summary

Advertisement

  • Rating:
  • License:
  • Freeware
  • Price:
  • FREE
  • Publisher Name:
  • Peter Jones
  • Publisher web site:
  • http://rubyforge.org/users/pjones/
  • Operating Systems:
  • Mac OS X
  • File Size:
  • 38 KB

Scrapes Tags


Scrapes Description

A framework for crawling and scraping multi-page web sites Unlike other scraping frameworks, the Scrapes framework is designed to be able to work with “dirty” web sites. That is, web sites that were not designed to have their data extracted programmatically.Scrapes includes features for both the initial development of a scraper, and the continued maintenance of that scraper.NOTE: Scraper is developed and licensed under the terms of the MIT/X Consortium License. Here are some key features of "Scrapes": · Rule based selection and extraction of data that can use CSS selectors or pseudo XPath expressions · Caching system so that during development you don’t have to continuously download pages from a web server while you experiment with your selectors and extractors · Validation system that helps detect web site changes that would otherwise invalidate your extraction rules · Support for initiating a session with the web server, and passing session cookies back to the web server · When all else fails, you can run a web page through the xsltproc XSLT processor to generate an XML document that can then be run through your rule based parser · Useful set of post-processing methods such as normalize_name What's New in This Release: · First public release.


Scrapes Related Software