Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does it mean to say a web crawler is I/O bound and not CPU bound?

I've seen this in some answers on S/O where the point is made that the programming language doesn't matter as much for a crawler and so C++ is overkill vs say Python. Can someone please explain this in layman's terms so that there's no ambiguity about what is implied? Clarification of the underlying assumption here is also appreciated.

Thanks

like image 567
algorithmicCoder Avatar asked May 21 '11 00:05

algorithmicCoder


2 Answers

It means that I/O is the bottleneck here. The act of going out to the net to retrieve a page (I/O) is slower than analysing the page (CPU).

So, making the CPU bit ten times faster will have little effect on the overall time taken. On the other hand, doubling the I/O speed will have a very beneficial effect, right up to the point where CPU starts being the bottleneck.

like image 195
paxdiablo Avatar answered Oct 13 '22 10:10

paxdiablo


It means that the program takes more time reading and writing (via disk or network) then it does actually running the algorithms in the code. I/O is vastly slower than most CPUs, and using it will usually slow down a program greatly.

like image 24
Ignacio Vazquez-Abrams Avatar answered Oct 13 '22 10:10

Ignacio Vazquez-Abrams