Is there a way to grep through the text documents stored in Google Cloud Storage?
I am storing over 10 thousand documents (txt file) on a VM and is using up space. And before it reaches the limit I want to move the documents to an alternative location. Currently, I am considering to move to Google Cloud Storage on GCP.
I sometimes need to grep the documents with specific keywords. I was wondering if there is any way I can grep through the documents uploaded on Google Cloud Storage? I checked the gsutil docs, but it seems ls,cp,mv,rm is supported but I dont see grep.
gsutil is a Python application that lets you access Cloud Storage from the command line. You can use gsutil to do a wide range of bucket and object management tasks, including: Creating and deleting buckets. Uploading, downloading, and deleting objects.
ls - List providers, buckets, or objects.
Unfortunately, there is no such command like grep for gsutil.
The only similary command is gsutil cat.
I suggest you can create a small vm, and grep on the cloud will faster and cheaper.
gsutil cat gs://bucket/ | grep "what you wnat to grep"
@howie answer is good. I just want to mention that Google Cloud Storage is a product intended to store files and does not care about the contents of them. Also, it is designed to be massively scalable and the operation you are asking for is computationally expensive, so it is very unlikely that it will be supported natively in the future.
In your case, I would consider to create a index of the text files and trigger an update for it every time a new file is upload to GCS.
I have another suggestion. You might want to consider using Google Dataflow to process the documents. You can just move them, but more importantly, you can transform the documents using Dataflow.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With