Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

compute engine use gsutil to download tgz file has crcmod error

I find if you create a compute engine (CentOS or Debian) machine and using gsutil to download (cp) a tgz file will cause a crcmod error...

$ gsutil cp gs://mybucket/data.tgz .
Copying gs://mybucket/data.tgz...
CommandException:
Downloading this composite object requires integrity checking with CRC32c, but
your crcmod installation isn't using the module's C extension, so the the hash
computation will likely throttle download performance. For help installing the
extension, please see:
  $ gsutil help crcmod
To download regardless of crcmod performance or to skip slow integrity checks,
see the "check_hashes" option in your boto config file.

Currently I use "check_hashes = never" to bypass the check...

$ vi /etc/boto.cfg
[GSUtil]
default_project_id = 429100748693
default_api_version = 2
check_hashes = never
...

But, what is the root cause? and is there any good solution to solve the problem?

like image 228
user3585766 Avatar asked Apr 29 '14 14:04

user3585766


People also ask

How do I run a gsutil command in Python?

After installing and configuring Google Cloud SDK gsutil command can be run by simply typing its name and the argument(-s) using Windows cmd.

What is gsutil command line tool?

gsutil is a Python application that lets you access Cloud Storage from the command line. You can use gsutil to do a wide range of bucket and object management tasks, including: Creating and deleting buckets. Uploading, downloading, and deleting objects. Listing buckets and objects.


1 Answers

The object you're trying to download is a composite object, which basically means it was uploaded in parallel chunks. gsutil automatically does this when uploading objects larger than 150M (a configurable threshold), to provide better performance.

Composite objects only have a crc32c checksum (no MD5), so in order to validate data integrity when downloading composite objects, gsutil needs to perform a crc32c checksum. Unfortunately, the libraries distributed with Python don't include a compiled crc32c implementation, so unless you install a compiled crc32c, gsutil will use a non-compiled Python implementation of crc32c that's quite slow. That warning is printed to let you know there's a way to fix that performance problem: Please run:

gsutil help crcmod

and follow the instructions there for installing a compiled crc32c. It's pretty easy to do it, and worth the effort.

One other note: I strongly recommend against setting check_hashes = never in your boto config file. That will disable integrity checking, which means it's possible your download could get corrupted and you wouldn't know it. You want data integrity checking enabled to ensure you're working with correct data.

like image 166
Mike Schwartz Avatar answered Oct 23 '22 03:10

Mike Schwartz