Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I use a Google Cloud Function to push a file from a Cloud Storage bucket into an instance?

I have had a task assigned to me to think of a way to set up a cloud function in GCP that does the following:

  • Monitors a Google Cloud Storage bucket for new files

  • Triggers when it detects a new file in the bucket

  • Copies that file to a directory inside a Compute Instance (Ubuntu)

I've been doing some research and am coming up empty. I know I can easily set up a cron job that syncs the bucket/directory every minute or something like that, but one of the design philosophies of the system we are building is to operate off triggers rather than timers.

Is what I am asking possible?

like image 643
Cam Avatar asked Dec 24 '22 01:12

Cam


1 Answers

You can trigger a Cloud Function from a Google Cloud Storage bucket, and by selecting the Event Type to be Finalize/Create, each time a file is uploaded in the bucket, the Cloud Function will be called.

Each time a new object is created in the bucket, the cloud function will receive a notification with a Cloud Storage object format.

Now, onto the second step, I could not find any API that can upload files from Cloud Storage to an instance VM. However, I did the following as a workaround, assuming that your instance VM has a server configured that can receive HTTP requests (for example Apache or Nginx):

main.py

import requests
from google.cloud import storage

def hello_gcs(data, context):
    """Background Cloud Function to be triggered by Cloud Storage.  
    Args:
        data (dict): The Cloud Functions event payload.
        context (google.cloud.functions.Context): Metadata of triggering event.
    Returns:
        None; the file is sent as a request to 
    """
    print('Bucket: {}'.format(data['bucket']))
    print('File: {}'.format(data['name']))

    client = storage.Client()
    bucket = client.get_bucket(data['bucket'])
    blob = bucket.get_blob(data['name'])

    contents = blob.download_as_string()

    headers = {
        'Content-type': 'text/plain',
    }

    data = '{"text":"{}"}'.format(contents)
    response = requests.post('https://your-instance-server/endpoint-to-download-files', headers=headers, data=data)
    return "Request sent to your instance with the data of the object"

requirements.txt

google-cloud-storage
requests

Most likely, it would be better to just send the object name and the bucket name to your server endpoint, and from there download the files using the Cloud Client Library.

Now you may ask...

How to make a Compute Engine instance to handle the request?

  1. Create a Compute Engine instance VM. Make sure it's in the same region as the cloud Function, and when creating it, allow HTTP connections to it. Documentation. I used a debian-9 image for this test.

  2. SSH into the instance, and run the following commands:

    • Install apache server

      sudo apt-get update
      sudo apt-get install apache2
      sudo apt-get install libapache2-mod-wsgi
      
    • Install this python libraries as well:

      sudo apt-get install python-pip
      sudo pip install flask
      
  3. Set up environment for your application:

    cd ~/
    mkdir app
    sudo ln -sT ~/app /var/www/html/app
    

Last line should point to the folder path where apache serves the index.html file from.

  1. Create your app in /home/<user_name>/app:

main.py

from flask import Flask, request
app = Flask(__name__)

@app.route('/', methods=['POST'])
def receive_file():
    file_content = request.form['data']
    # TODO
    # Implement process to save this data onto a file
    return 'Hello from Flask!'

if __name__ == '__main__':
  app.run()
  1. Create wsgi server entrypoint, in the same directory:

main.wsgi

import sys
sys.path.insert(0, '/var/www/html/app')

from main import app as application
  1. Add the following line to /etc/apache2/sites-enabled/000-default.conf, after the DocumentRoot tag:

        WSGIDaemonProcess flaskapp threads=5
        WSGIScriptAlias / /var/www/html/app/main.wsgi
    
        <Directory app>
                WSGIProcessGroup main
                WSGIApplicationGroup %{GLOBAL}
                Order deny,allow
                Allow from all
        </Directory>
    
  2. Run sudo apachectl restart. You should be able to send post requests to your application, to the internal IP of the VM instance (you can see it in the Console, in the Compute Engine section). Once you have it, in your cloud function, you should change the response line to:

    response = requests.post('<INTERNAL_INSTANCE_IP>/', headers=headers, data=data)
    
    return "Request sent to your instance with the data of the object"
    
like image 62
Joan Grau Noël Avatar answered Jan 13 '23 14:01

Joan Grau Noël