Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Errno 22 When downloading multiple files from S3 bucket "sub-folder"

I've been trying to use the AWS CLI to download all files from a sub-folder in AWS however, after the first few files download it fails to download the rest. I believe this is because it adds an extension to the filename and it then sees that as an invalid filepath.

I'm using the following command;

aws s3 cp s3://my_bucket/sub_folder /tmp/ --recursive

It gives me the following error for almost all of the files in the subfolder;

[Errno 22] Invalid argument: 'C:\\tmp\\2019-08-15T16:15:02.tif.deDBF2C2

I think this is because of the .deDBF2C2 extension it seems to be adding to the files when downloading though I don't know why it does. The filenames all end with .tif in the actual bucket.

Does anyone know what causes this?

Update: The command worked once I executed it from a linux machine. Seems to be specific to windows.

like image 439
Ozymandias Avatar asked Sep 04 '19 10:09

Ozymandias


2 Answers

This is an oversight by AWS using Windows reserved characters in Log files names! When you execute the command it will create all the directory's however any logs with :: in the name fail to download.

Issue is discussed here: https://github.com/aws/aws-cli/issues/4543

So frustrated I came up with a workaround by executing a "DryRun" which prints the expected log output and porting that to a text file, eg:

>aws s3 cp s3://config-bucket-7XXXXXXXXXXX3 c:\temp --recursive --dryrun > c:\temp\aScriptToDownloadFilesAndReplaceNames.txt

The output file is filled with these aws log entries we can turn into aws script commands:

(dryrun) download: s3://config-bucket-7XXXXXXXXXXX3/AWSLogs/7XXXXXXXXXXX3/Config/ap-southeast-2/2019/10/1/ConfigHistory/7XXXXXXXXXXX3_Config_ap-southeast-2_ConfigHistory_AWS::RDS::DBInstance_20191001T103223Z_20191001T103223Z_1.json.gz to \AWSLogs\7XXXXXXXXXXX3\Config\ap-southeast-2\2019\10\1\ConfigHistory\703014955993_Config_ap-southeast-2_ConfigHistory_AWS::RDS::DBInstance_20191001T103223Z_20191001T103223Z_1.json.gz

In Notepad++ or other text editor you replace the (dryrun) download: with aws s3 cp

enter image description here

Then you will see the following lines with the command: aws s3 cp, the Bucket file and the local file path. We need to remove the :: in the local file path on the right side of the to:

aws s3 cp s3://config-bucket-7XXXXXXXXXXX3/AWSLogs/7XXXXXXXXXXX3/Config/ap-southeast-2/2019/10/1/ConfigHistory/7XXXXXXXXXXX3_Config_ap-southeast-2_ConfigHistory_AWS::RDS::DBInstance_20191001T103223Z_20191001T103223Z_1.json.gz to AWSLogs\7XXXXXXXXXXX3\Config\ap-southeast-2\2019\10\1\ConfigHistory\7XXXXXXXXXXX3_Config_ap-southeast-2_ConfigHistory_AWS::RDS::DBInstance_20191001T103223Z_20191001T103223Z_1.json.gz

We can replace the :: with - only in local paths not S3 Bucket path's using a regex (.*):: that removes the last occurrence of chars at the end of each line:

enter image description here And here we can see I've replaced the ::'s with hyphens $1- by clicking 'Replacing All' twice:

enter image description here

Next remove the to (ignore the | cursor icon in the below image, to should be replaced with nothing).
FIND: json.gz to AWSLogs
REPLACE: json.gz AWSLogs

enter image description here

Finally select all the lines copy/paste into a command prompt to download all the files with reserved file characters!

UPDATE:

If you have WSL (Windows Subsystem Linux) you should be able to download the files and then issue a simple file rename replacing the ::'s before copying to the mounted Windows folder system.

like image 53
Jeremy Thompson Avatar answered Oct 16 '22 18:10

Jeremy Thompson


I tried from my raspberry pi and it worked. Seems to only be an issue with Windows OS.

like image 1
Ozymandias Avatar answered Oct 16 '22 19:10

Ozymandias