Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Writing a modular aggregator and normalizer in Perl

Tags:

perl

cpan

poe

I've just entered into an environment where I am much more free to choose whatever approach I want for a project (meaning full access to the CPAN and no module-approval-by-committee), but I'm a little out of touch with the new hotnesses, so I thought I'd solicit for ideas here.

My project involves scraping multiple sources with varying formats (html, zipped text, csv, etc.) normalizing and then processing them into some sort of datastore. The pulls need to happen at programmable intervals and I'd like to make the back-end modular so that similar sources can use the same codebase. It also needs to be able to respond via the web with a simple status of running processes (nothing fancy). I was thinking POE might be a good idea with several collector processes reporting to one master, but are there any specific modules in POE (or elsewhere) that anyone thinks I should have a look at?

like image 849
Flowchartsman Avatar asked Aug 17 '11 18:08

Flowchartsman


1 Answers

WWW::Mechanize is a great module for getting info off webpages.
It allows you to login to websites by providing login and password, allows you to submit forms and so on.

You can find more info at: http://metacpan.org/pod/WWW::Mechanize

like image 141
Andrey Avatar answered Nov 13 '22 07:11

Andrey