Methabot

A free web crawler and command line tool optimized for speed
Download

Methabot Ranking & Summary

Advertisement

  • Rating:
  • License:
  • Freeware
  • Price:
  • FREE
  • Publisher Name:
  • Emil Romanus
  • Publisher web site:
  • http://bithack.se/
  • Operating Systems:
  • Mac OS X
  • File Size:
  • 479 KB

Methabot Tags


Methabot Description

A free web crawler and command line tool optimized for speed Methabot supports scripted filetype parsing, a wide variety of customization options and is easily configured to fit anyones particular needs. Methabot is targeted for extensibility and customization. It's being developed for high modularity, and comes with javascript as scripting language. With the use of the module system and scripting language, users are able to take full or partial control of the crawling process and decide however Methabot should store web data, statistics and much more. Just by running Methabot from command line you are able configure custom filetypes, filtering expressions, behaviour, and much more, so you don't have to be a scripter!Methabot is portable and tested with success on Mac OS X, 32-bit/64-bit Linux 2.6, 32-bit/64-bit FreeBSD 6.x/7.0, and Windows XP. Should work on almost any Unix-like OS. Here are some key features of "Methabot": · It's fast, designed from the ground and up with speed-optimization in mind. · Scriptable through E4X · User-defined filetype filtering (according to MIME type, file extension or UMEX expression) · Multi-threaded · Highly configurable from command line · Extensible module system, supporting custom data parsers and filters. · Simple yet powerful filtering of URLs through UMEX. · Automated downloading · Support for automatic cookie handling when running over HTTP · Reliable, fault-tolerant networking What's New in This Release: · Support for converting between character encodings through libiconv · New parser utf8conv for converting almost any character encoding to utf8 · New parser entityconv, converts html entities such as ä to the · corresponding utf-8 character · The configuration system has been moved to a seperate library, libmethaconfig · Various improvements to the configuration loader, such as dynamically adding · and changing classes and scopes · Lots of memory usage optimizations and cleanup fixes · The documentation available in the wiki has been copied to a texinfo file, · from now on all documentation will be put in this texinfo file and available · as a manual both online and offline · Support for filetype attributes. Parsers can now set custom data that will · be associated with a parsed file. Attributes' primary area of use is when you · are connected to a Methanol system and want to store meta-data about a URL. · new Javascript function set_attribute() for setting attributes for the · current URL · API support for custom status, error/warning and target reporter functions · lmetha_global_setopt() is no longer available, replaced with lmetha_setopt() · options · SpiderMonkey-1.8.0 support added · New global Javascript function exec() · New built-in handler function writefile · libmetha no longer depends on libev, but instead uses pipes and epoll() for · inter-thread communication and waiting for events on sockets. · Added internal counters useful for keeping statistics · New filetype option 'ignore_host' · --external option set to false can no longer be circumvented using a HTTP- · redirect · Support for CURIE (why not?) in the built-in HTML parser added · Bugfix, a syntax error would in some rare cases occur when parsing integer · values in configuration files · Bugfix in the configuration file parser when reading flag values · Bugfix, when javascript filetype parsers did not return a value, it was · treated as a string, "undefined", and used as a relative URL


Methabot Related Software