I am using dask
as in how to parallelize many (fuzzy) string comparisons using apply in Pandas?
Basically I do some computations (without writing anything to disk) that invoke Pandas
and Fuzzywuzzy
(that may not be releasing the GIL apparently, if that helps) and I run something like:
dmaster = dd.from_pandas(master, npartitions=4)
dmaster = dmaster.assign(my_value=dmaster.original.apply(lambda x: helper(x, slave), name='my_value'))
dmaster.compute(get=dask.multiprocessing.get)
However, a variant of the code has been running for 10 hours now, and is not over yet. I notice in windows task manager that
RAM utilization
is pretty low, corresponding to the size of my dataCPU usage
bounces from 0% to up to 5% every 2/3 seconds or so20 Python processes
whose size is 100MB, and one Python process that likely contains the data that is 30GB in size (I have a 128 GB machine with a 8 core CPU)Question is: is that behavior expected? Am I obviously terribly wrong in setting some dask
options here?
Of course, I understand the specifics depends on what exactly I am doing, but maybe the patterns above can already tell that something is horribly wrong?
Many thanks!!
Of course, I understand the specifics depends on what exactly I am doing, but maybe the patterns above can already tell that something is horribly wrong?
This is pretty spot on. Identifying performance issues is tricky, especially when parallel computing comes into play. Here are some things that come to mind.
helper
could be doing something oddly.Generally a good way to pin down these problems is to create a minimal, complete, verifiable example to share that others can reproduce and play with easily. Often in when creating such an example you find the solution to your problem anyway. But if this doesn't happen at least you can then pass the buck on to the library maintainer. Until such an example is created most library maintainers don't bother to spend their time, there is almost always too many details specific to the problem at hand to warrant free service.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With