Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dask read_csv timeout on Amazon s3 with big files

dask read_csv timeout on s3 for big files

s3fs.S3FileSystem.read_timeout = 5184000  # one day
s3fs.S3FileSystem.connect_timeout = 5184000  # one day

client = Client('a_remote_scheduler_ip_here:8786')

df = dd.read_csv('s3://dask-data/nyc-taxi/2015/*.csv')
len(df)

len(df) has timeout exception, if the file is small, then it just works well.

I think we need a way to set s3fs.S3FileSystem.read_timeout on the remote workers, not the local code, but I have no idea how to do it.

Here is a part of the stack trace:

File "/opt/conda/lib/python3.6/site-packages/dask/bytes/utils.py", line 238, in read_block File "/opt/conda/lib/python3.6/site-packages/s3fs/core.py", line 1333, in read File "/opt/conda/lib/python3.6/site-packages/s3fs/core.py", line 1303, in _fetch File "/opt/conda/lib/python3.6/site-packages/s3fs/core.py", line 1520, in _fetch_range File "/opt/conda/lib/python3.6/site-packages/botocore/response.py", line 81, in read botocore.exceptions.ReadTimeoutError: Read timeout on endpoint URL: "None"

like image 965
Võ Trường Duy Avatar asked Dec 01 '25 21:12

Võ Trường Duy


1 Answers

Setting the timeouts using the class attribute seems like a reasonable thing to do, but you are using a client talking with workers in other processes/machines. Therefore, you would need to set the attribute on the copies of the class on each worker for your method to take affect.

Better, perhaps, would be to set the blocksize being used by read_csv (64MB by default) to a smaller number. I assume that you are on a slower network, and this is why you are getting timeouts. If you need numbers below 5MB, the default readahead size in s3fs, then you should also pass default_block_size amongst the storage_options passed to read_csv

Note, finally, that both s3fs and dask allow for retries, on connection errors or general task errors. That may be enough to help you in the case that you only get this for the occasional laggy ready.

like image 109
mdurant Avatar answered Dec 03 '25 12:12

mdurant



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!