I've just entered into an environment where I am much more free to choose whatever approach I want for a project (meaning full access to the CPAN and no module-approval-by-committee), but I'm a little out of touch with the new hotnesses, so I thought I'd solicit for ideas here.
My project involves scraping multiple sources with varying formats (html, zipped text, csv, etc.) normalizing and then processing them into some sort of datastore. The pulls need to happen at programmable intervals and I'd like to make the back-end modular so that similar sources can use the same codebase. It also needs to be able to respond via the web with a simple status of running processes (nothing fancy). I was thinking POE might be a good idea with several collector processes reporting to one master, but are there any specific modules in POE (or elsewhere) that anyone thinks I should have a look at?
WWW::Mechanize is a great module for getting info off webpages.
It allows you to login to websites by providing login and password, allows you to submit forms and so on.
You can find more info at: http://metacpan.org/pod/WWW::Mechanize
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With