I have a bucket/folder into which a lot for files are coming in every minutes. How can I read only the new files based on file timestamp.
eg: list all files with timestamp > my_timestamp
You could use some bash-fu:
gsutil ls -l gs://<your-bucket-name> | sort -k2n | tail -n1 | awk 'END {$1=$2=""; sub(/^[ \t]+/, ""); print }'
breaking that down:
# grab detailed list of objects in bucket
gsutil ls -l gs://your-bucket-name
# sort by number on the date field
sort -k2n
# grab the last row returned
tail -n1
# delete first two cols (size and date) and ltrim to remove whitespace
awk 'END {$1=$2=""; sub(/^[ \t]+/, ""); print }'`
Tested with Google Cloud SDK v186.0.0
, gsutil v4.28
This is not a feature that gsutil or the GCS API provides, as there is no way to list objects by timestamp.
Instead, you could subscribe to new objects using the GCS Cloud Pub/Sub feature.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With