I get the following error:
multiprocessing.pool.MaybeEncodingError: Error sending result: '<multiprocessing.pool.ExceptionWithTraceback object at 0x7f758760d6a0>'. Reason: 'TypeError("cannot serialize '_io.BufferedReader' object",)'
When running this code:
from operator import itemgetter
from multiprocessing import Pool
import wget
def f(args):
print(args[1])
wget.download(args[1], "tests/" + target + '/' + str(args[0]), bar=None)
if __name__ == "__main__":
a = Pool(2)
a.map(f, list(enumerate(urls))) #urls is a list of urls.
What does the error mean and how can I fix it?
First couple of advices:
wget package is not.Now, to the issue.
Apparently wget uses urllib.request for making request. After some testing, I concluded that it doesn't handle all HTTP status codes. More specifically, it somehow breaks when HTTP status is, for example, 304. This is why you have to use libraries with higher level interface. Even the urllib.request says this in official documentation:
The Requests package is recommended for a higher-level HTTP client interface.
So, without further ado, here is the working snippet.
You can just update with where you want to save files.
from multiprocessing import Pool
import shutil
import requests
def f(args):
print(args)
req = requests.get(args[1], stream=True)
with open(str(args[0]), 'wb') as f:
shutil.copyfileobj(req.raw, f)
if __name__ == "__main__":
a = Pool(2)
a.map(f, enumerate(urls)) # urls is a list of urls.
shutil lib is used for file manipulation. In this case, to stream the data to a file object.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With