Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google Cloud Storage: How to get list of new files in bucket/folder using gsutil

I have a bucket/folder into which a lot for files are coming in every minutes. How can I read only the new files based on file timestamp.

eg: list all files with timestamp > my_timestamp

like image 809
Remis Haroon - رامز Avatar asked May 17 '17 06:05

Remis Haroon - رامز


2 Answers

You could use some bash-fu:

gsutil ls -l gs://<your-bucket-name> | sort -k2n | tail -n1 | awk 'END {$1=$2=""; sub(/^[ \t]+/, ""); print }'

breaking that down:

# grab detailed list of objects in bucket
gsutil ls -l gs://your-bucket-name 

# sort by number on the date field
sort -k2n

# grab the last row returned 
tail -n1

# delete first two cols (size and date) and ltrim to remove whitespace
awk 'END {$1=$2=""; sub(/^[ \t]+/, ""); print }'`

Tested with Google Cloud SDK v186.0.0, gsutil v4.28

like image 145
Jujhar Singh Avatar answered Sep 24 '22 08:09

Jujhar Singh


This is not a feature that gsutil or the GCS API provides, as there is no way to list objects by timestamp.

Instead, you could subscribe to new objects using the GCS Cloud Pub/Sub feature.

like image 38
jterrace Avatar answered Sep 26 '22 08:09

jterrace