Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

gsutil cp: copy files with -I option to matching subdirectories

Tags:

gsutil

I would like to copy a list of files to a bucket while keeping the directory-structure.

test.txt:

a/b/1.jpg
a/c/23.jpg
a/d/145.jpg

gsutil command:

cat file.txt | gsutil -m cp -I 'gs://my-bucket/'

This copies the files but ignores the subdirectories. Is there a way to solve my problem? Thanks a lot!

like image 205
Markus Steinhauer Avatar asked Nov 20 '22 21:11

Markus Steinhauer


1 Answers

I came across this question because i had a very similar case. There still isn't a great way to do that, but i recently found this tip which allows to use gsutil rsync and hack -x flag to act as inclusion rather than exclusion by adding negative lookahead.

For example, below would copy all json files found in any subdirectory of current directory, while preserving their paths in a bucket

gsutil -m rsync -r -x '^(?!.*\.json$).*' . gs://mybucket

This can be further adjusted to include multiple entries. For example, this command would copy all found json, yaml and yml files

gsutil -m rsync -r -x '^(?!.*\.(json|yaml|yml)$).*' . gs://mybucket

By itself this is not very useful for a case, where you have specified file list, but let's work on it. Let's use youtube-dl repo (https://github.com/ytdl-org/youtube-dl.git) as an example.

Let's take all md files from the repo and pretend they are our specified file list. Last file is in a subpath

find * -name "*.md"
CONTRIBUTING.md
README.md
docs/supportedsites.md

We use * to remove leading dots from the names to require less processing

# Read file paths into var
# For file with path list, use
# cat file|read -d '' flist
find * -name "*.md"|read -d '' flist

# Concat paths into what gsutil accepts as a file list in -x parameter
rx="^(?\!($(echo $flist|tr '\n' '|')$)).*"

# Preview rx variable (just for clarity)
echo $rx
^(?!(CONTRIBUTING.md|README.md|docs/supportedsites.md|$)).*

# Run sync in dry mode
gsutil -m rsync -n -r -x $rx . gs://mybucket
...
Would copy file://./CONTRIBUTING.md to gs://mybucket/CONTRIBUTING.md
Would copy file://./README.md to gs://mybucket/README.md
Would copy file://./docs/supportedsites.md to gs://mybucket/docs/supportedsites.md

While a little involved, it does allow use of -m flag for speed while preserving paths.

With some more processing it should be very possible to

  • remove empty newline from find result
  • handle paths beginning with ./
like image 125
Jedrzej G Avatar answered Dec 20 '22 18:12

Jedrzej G