Download multiple file from Google cloud storage using Python

Tags:

I am trying to download multiple files from the Google cloud storage folder. I am able to download the single file but unable to download multiple files. I took this reference from this link but seems it is not working. The code is as follow:

# [download multiple files]
bucket_name = 'bigquery-hive-load'
# The "folder" where the files you want to download are
folder="/projects/bigquery/download/shakespeare/"

# Create this folder locally
if not os.path.exists(folder):
    os.makedirs(folder)

# Retrieve all blobs with a prefix matching the folder
    bucket=storage_client.get_bucket(bucket_name)
    print(bucket)
    blobs=list(bucket.list_blobs(prefix=folder))
    print(blobs)
    for blob in blobs:
        if(not blob.name.endswith("/")):
            blob.download_to_filename(blob.name)

# [End download to multiple files]

Is there any way to download multiple files matching with the pattern(name) or something else. Since I am exporting the file from bigquery, the file names will be something like below:

shakespeare-000000000000.csv.gz
shakespeare-000000000001.csv.gz
shakespeare-000000000002.csv.gz
shakespeare-000000000003.csv.gz

Reference: Working code to download single file:

# [download to single files]

edgenode_destination_uri = '/projects/bigquery/download/shakespeare-000000000000.csv.gz'
bucket_name = 'bigquery-hive-load'
gcs_bucket = storage_client.get_bucket(bucket_name)
blob = gcs_bucket.blob("shakespeare.csv.gz")
blob.download_to_filename(edgenode_destination_uri)
logging.info('Downloded {} to {}'.format(
    gcs_bucket, edgenode_destination_uri))

# [end download to single files]

863

asked Jul 06 '18 06:07

Sandeep Singh

2 Answers

After some trial, I solved this and couldn't stop myself from posting here as well.

bucket_name = 'mybucket'
folder='/projects/bigquery/download/shakespeare/'
delimiter='/'
file = 'shakespeare'

# Retrieve all blobs with a prefix matching the file.
bucket=storage_client.get_bucket(bucket_name)
# List blobs iterate in folder 
blobs=bucket.list_blobs(prefix=file, delimiter=delimiter) # Excluding folder inside bucket
for blob in blobs:
   print(blob.name)
   destination_uri = '{}/{}'.format(folder, blob.name) 
   blob.download_to_filename(destination_uri)

answered Nov 06 '22 17:11

Sandeep Singh

It looks like you may simply have the wrong level of indentation in your python code. The block beginning with # Retrieve all blobs with a prefix matching the folder is within the scope of the if above so it's never executed if the folder already exists.

Try this:

# [download multiple files]
bucket_name = 'bigquery-hive-load'
# The "folder" where the files you want to download are
folder="/projects/bigquery/download/shakespeare/"

# Create this folder locally
if not os.path.exists(folder):
    os.makedirs(folder)

# Retrieve all blobs with a prefix matching the folder
bucket=storage_client.get_bucket(bucket_name)
print(bucket)
blobs=list(bucket.list_blobs(prefix=folder))
print(blobs)
for blob in blobs:
    if(not blob.name.endswith("/")):
        blob.download_to_filename(blob.name)

# [End download to multiple files]

answered Nov 06 '22 16:11

Robert Jordan

Related questions
                            
                                How to fill pandas dataframe columns with random dictionary values
                            
                                How run a scrapy spider programmatically like a simple script?
                            
                                Plotly legend next to each subplot, Python
                            
                                Are Pandas' dataframes (Python) closer to R's dataframes or datatables? [closed]
                            
                                Mock authentication decorator in unittesting
                            
                                How to create packages in Python 3? ModuleNotFoundError
                            
                                Reindexing a specific level of a MultiIndex dataframe
                            
                                Where is dumped file in Google Colab?
                            
                                Show exhaustive information for passed tests in pytest
                            
                                In python assert, how to print the condition when the assertion failed?
                            
                                Flask-SqlAlchemy Many-To-Many relationship with duplicates allowed
                            
                                ValueError: wrapper loop when unwrapping
                            
                                Custom exceptions in unittests
                            
                                Access child class variable in parent class
                            
                                Determine if object is of type Foo without importing type Foo
                            
                                Spark streaming with python: how to add a UUID column?
                            
                                Append a level to a pandas MultiIndex
                            
                                How to get the value of a tensor? Python
                            
                                Recycling in Pandas Dataframe
                            
                                Does a default parameters overwrite type hints for mypy?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Download multiple file from Google cloud storage using Python

Tags:

python

python-3.x

google-cloud-platform

google-cloud-storage

Sandeep Singh

People also ask

2 Answers

Sandeep Singh

Robert Jordan

Recent Activity

Donate For Us