Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Download multiple pages concurrently?

I'd like to write a script in Python that can grab url's from a database, and download web pages concurrently to speed things instead of waiting for each page to download one after the other.

According to this thread, Python doesn't allow this because of something called Global Interpreter Lock that prevents lauching the same script multiple times.

Before investing time learning the Twisted framework, I'd like to make sure there isn't an easier way to do what I need to do above.

Thank you for any tip.

like image 890
Gulbahar Avatar asked Feb 22 '26 09:02

Gulbahar


1 Answers

Don't worry about GIL. In your case it doesn't matter.

Easiest way to do what you want is to create thread pool, using threading module and one of thread pool implementations from ASPN. Each thread from that pool can use httplib to download your web pages.

Another option is to use PyCURL module -- it supports parallel downlaods natively, so you don't have to implement it yourself.

like image 100
Bartosz Avatar answered Feb 23 '26 23:02

Bartosz