This is a bit of a head scratcher for me because I've done this same procedure with the same types of documents multiple times. No issues before and the last refresh was as recent as October.
I have project that involves downloading and parsing a large batch of PDFs quarterly. There are about 1.1 million at each release. Each PDF is 3-4 pages so the files are pretty small. I do everything with them locally but then push from the external SSD to S3 once downloaded to have a backup.
I'm currently attempting to refresh. It gets about 50 files in and then I get the error
fatal error: [Errno 24] Too many open files
None of the files are open.
Pretty straightforward but this is what I'm doing in terminal
> screen
> aws s3 sync '/Volumes/G-DRIVE slim SSD USB-C/DOF PDFs/NPV_all' s3://dof.taxdocs/NPV
Does anyone have any thoughts?
Edit:
I tried setting ulimit to 65536/200000 which did not work. I also tried reverting to an earlier AWS CLI 2.X which also did not work. Finally I hooked the external SSD up to my laptop and it's syncing fine from there. On that machine ulimit is set to 256 which tells me that is not the issue. However it is running aws-cli/1.11.28 Python/2.7.10 Darwin/18.7.0 botocore/1.4.85. OS is Mojave 10.14.6 (same as the desktop which is encountering the problem). I strongly suspect this is a bug in the more recent versions of AWS CLI. If anyone else runs into this in the future I would first try the most recent AWS CLI (hopefully no bugs by then) and then revert to a version 1.X.
Ensure your awscli is up to date to latest version, eg brew upgrade awscli
Check aws configure get s3.max_concurrent_requests for current limit - maybe you forgot that you've set up in the past (like myself .-.) to something like 200
Set to a sensible value again: aws configure set s3.max_concurrent_requests 10
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With