I need to scrape about 100 websites that are very similar in the content that they provide.
My first doubt. Should be possible to write a generic script to scrape all the 100 websites or in scraping techniques is only possible to write scripts for particular websites. (Dumb question.). I think I should ask what possibility is easier. Write 100 different scripts for each website is hard.
Second question. My primary language is PHP, but after searching here on Stackoverflow I found that one of the most advanced scrapers is "Beautiful Soup" in Python. Should be possible to make calls in PHP to "Beautiful Soup" in Python? Or should be better to do all the script in Python?
Give me some clues on how should I go.
Sorry for my weak english.
Best Regards,
Because I prefer PHP rather than Python, I once used phpQuery to scrape data from websites. It works pretty well, and I came up with a scaper pretty quickly, using CSS selectors (with the help of SelectorGadget) to select elements and get the ->text() of it.
But I found it to be a bit slow (since I had to scrape thousands of pages), so in the end I changed it to use regex to scrape data instead. D:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With