I have a >100,000 urls (different domains) in a list that I want to download and save in a database for further processing and tinkering.
Would it be wise to use scrapy instead of python's multiprocessing / multithreading? If yes, how do I write a standalone script to do the same?
Also, feel free to suggest other awesome approaches that come to your mind.
Scrapy does not seem relevant here if you know very well the URL to fetch (there's is no crawling involved here).
The easiest way that comes to mind would be to use Requests
. However, querying each URL in a sequence and block waiting for answers wouldn't be efficient, so you could consider GRequests
to send batches of requests asynchronously.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With