Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simple web crawler speed issue

I have created a very simple web crawler in PHP, where I crawl some soccer sites for match results.

But when I crawl a website, it takes about 0.5 - 1 second to crawl it. So if I have a lot of urls to crawl it will take a lot of time.

This is my code start for crawling the site:

$doc = new DOMDocument();
$doc->loadHTMLFile("http://resultater.dai-sport.dk/tms/Turneringer-og-resultater/Pulje-Stilling.aspx?PuljeId=229");
$xpath = new DOMXpath($doc);

I have created the crawler myself, so maybe there is a better way to do this or a quicker way? Or maybe my expectations about the speed is to high?

like image 345
Andreas Baran Avatar asked Jan 22 '26 07:01

Andreas Baran


1 Answers

Please check this lib for kind of asynchronous realization of your crawler. It uses "yield", appeared in PHP 5.5: https://github.com/icicleio/Icicle

You will find usage example in library examples.

like image 107
Anton Avatar answered Jan 23 '26 20:01

Anton