Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to grep through text documents stored in Google Cloud Storage?

Question

Is there a way to grep through the text documents stored in Google Cloud Storage?

Background

I am storing over 10 thousand documents (txt file) on a VM and is using up space. And before it reaches the limit I want to move the documents to an alternative location. Currently, I am considering to move to Google Cloud Storage on GCP.

Issues

I sometimes need to grep the documents with specific keywords. I was wondering if there is any way I can grep through the documents uploaded on Google Cloud Storage? I checked the gsutil docs, but it seems ls,cp,mv,rm is supported but I dont see grep.

like image 438
tetsushi awano Avatar asked Mar 05 '19 02:03

tetsushi awano


People also ask

Which command line tool is used for Google Cloud Storage?

gsutil is a Python application that lets you access Cloud Storage from the command line. You can use gsutil to do a wide range of bucket and object management tasks, including: Creating and deleting buckets. Uploading, downloading, and deleting objects.

What command is used to show a list of Cloud Storage buckets?

ls - List providers, buckets, or objects.


3 Answers

Unfortunately, there is no such command like grep for gsutil.

The only similary command is gsutil cat.

I suggest you can create a small vm, and grep on the cloud will faster and cheaper.

gsutil cat gs://bucket/ | grep "what you wnat to grep"
like image 144
howie Avatar answered Oct 29 '22 05:10

howie


@howie answer is good. I just want to mention that Google Cloud Storage is a product intended to store files and does not care about the contents of them. Also, it is designed to be massively scalable and the operation you are asking for is computationally expensive, so it is very unlikely that it will be supported natively in the future.

In your case, I would consider to create a index of the text files and trigger an update for it every time a new file is upload to GCS.

like image 43
llompalles Avatar answered Oct 29 '22 05:10

llompalles


I have another suggestion. You might want to consider using Google Dataflow to process the documents. You can just move them, but more importantly, you can transform the documents using Dataflow.

like image 42
Jay Avatar answered Oct 29 '22 06:10

Jay