Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Web Crawler For Competive Pricing [closed]

I am thinking of writing an application that will pseudo-track competing websites to ensure that our prices stay competitive, etc. I looked at possibly using the Google Shopping Search API, but I felt that it could possibly be lacking in flexibility and not all of our competitors are fully listed or updated regularly.

My question, is where is a good place to start with a PHP based webcrawler? I obviously want a crawler that is respectful (even to our competitors), so it will hopefully obey the robots.txt and throttling. (To be fair, I think I am even going to host this on a third party server and have it crawl our websites to show no biases.) I looked around via google and I couldn't find any mature packages -- only some poorly written sourceforge scripts that haven't been maintained in over a year, despite being labeled as beta or alpha.

Looking for ideas or suggestions. Thanks

like image 860
Brandon0 Avatar asked Nov 14 '22 04:11

Brandon0


1 Answers

A crawler in itself isn't that complicated. You just load up the site then evaluate and follow the links you find.

What you might do in order to be "friendly" is to purpose build a crawler for each site you plan on trawling. In other words pick one site and see how they are structured. Code your get requests and html parsing around that structure. Rinse and repeat for the other sites.

If they are using a common shopping cart software (anything is possible here) then obviously you have a bit of reuse.

When crawling, you might want to hit their sites during off peak hours (this is going to be a guess). Also, don't execute 500/requests a second. Throttle it down quite a bit.

One optional thing you might even consider would be to contact these other sites and see if they want to participate in some direct data sharing. The ideal would be for everyone to have an RSS feed for their products.

Of course, depending on who you are selling to this might be considered price fixing... So, proceed with caution.

like image 187
NotMe Avatar answered Dec 18 '22 10:12

NotMe