I'd like to write a script in Python that can grab url's from a database, and download web pages concurrently to speed things instead of waiting for each page to download one after the other.
According to this thread, Python doesn't allow this because of something called Global Interpreter Lock that prevents lauching the same script multiple times.
Before investing time learning the Twisted framework, I'd like to make sure there isn't an easier way to do what I need to do above.
Thank you for any tip.
Don't worry about GIL. In your case it doesn't matter.
Easiest way to do what you want is to create thread pool, using threading module and one of thread pool implementations from ASPN. Each thread from that pool can use httplib to download your web pages.
Another option is to use PyCURL module -- it supports parallel downlaods natively, so you don't have to implement it yourself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With