I'm trying to parse the websites that contain car's properties(154 kinds of properties). I have a huge list(name is liste_test) that consist of 280.000 used car announcement URL.
def araba_cekici(liste_test,headers,engine):
for link in liste_test:
try:
page = requests.get(link, headers=headers)
.....
.....
When I start my code like that:
araba_cekici(liste_test,headers,engine)
It works and getting results. But approximately in 1 hour, I could only obtain 1500 URL's properties. It is very slow, and I must use multiprocessing.
I found a result on here with multiprocessing. Then I applied to my code, but unfortunately, it is not working.
import numpy as np
import multiprocessing as multi
def chunks(n, page_list):
"""Splits the list into n chunks"""
return np.array_split(page_list,n)
cpus = multi.cpu_count()
workers = []
page_bins = chunks(cpus, liste_test)
for cpu in range(cpus):
sys.stdout.write("CPU " + str(cpu) + "\n")
# Process that will send corresponding list of pages
# to the function perform_extraction
worker = multi.Process(name=str(cpu),
target=araba_cekici,
args=(page_bins[cpu],headers,engine))
worker.start()
workers.append(worker)
for worker in workers:
worker.join()
And it gives:
TypeError: can't pickle _thread.RLock objects
I found some kind of responses with respects to this error. But none of them works(at least I can't apply to my code). Also, I tried python multiprocess Pool but unfortunately it stucks on jupyter notebook and seems this code works infinitely.
Late answer, but since this question turns up when searching on Google: multiprocessing
sends the data to the worker processes via a multiprocessing.Queue
, which requires all data/objects sent to be picklable.
In your code, you try to pass header
and engine
, whose implementations you don't show. (Since header
holds the HTTP request header, I suspect that engine
is the issue here.) To solve your issue, you either have to make engine
picklable, or only instantiate engine
within the worker process.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With