I am running a python script and using the os
library to execute a gsutil
command, which is typically executed in the command prompt on Windows. I have some file on my local computer and I want to put it into a Google Bucket
so I do:
import os
command = 'gsutil -m cp myfile.csv gs://my/bucket/myfile.csv'
os.system(command)
I get a message like:
==> NOTE: You are uploading one or more large file(s), which would run significantly faster if you enable parallel composite uploads. This feature can be enabled by editing the "parallel_composite_upload_threshold" value in your .boto configuration file. However, note that if you do this large files will be uploaded as 'composite objects https://cloud.google.com/storage/docs/composite-objects'_, which means that any user who downloads such objects will need to have a compiled crcmod installed (see "gsutil help crcmod"). This is because without a compiled crcmod, computing checksums on composite objects is so slow that gsutil disables downloads of composite objects.
I want to get rid of this message either by hiding it if it's irrelevant od actually doing what it suggests, but I can't find the .boto file. What should I do?
The Parallel Composite Uploads section of the documentation for gsutil
describes how to resolve this (assuming, as the warning specifies, that this content will be used by clients with the crcmod
module available):
gsutil -o GSUtil:parallel_composite_upload_threshold=150M cp bigfile gs://your-bucket
To do this safely from Python would look like:
filename='myfile.csv'
gs_bucket='my/bucket'
parallel_threshold='150M' # minimum size for parallel upload; 0 to disable
subprocess.check_call([
'gsutil',
'-o', 'GSUtil:parallel_composite_upload_threshold=%s' % (parallel_threshold,),
'cp', filename, 'gs://%s/%s' % (gs_bucket, filename)
])
Note that here you're explicitly providing argument vector boundaries, and not relying on a shell to do this for you; this prevents a malicious or buggy filename from performing undesired operations.
If you don't know that the clients accessing content in this bucket will have the crcmod
module, consider setting parallel_threshold='0'
above, which will disable this support.
Another way is to set the configuration that the prompt says inside a file in the BOTO_PATH
. usually $HOME/.boto
.
[GSUtil]
parallel_composite_upload_threshold = 150M
For max speed install the crcmod
C library
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With