Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SSL error using multiprocessing in Python with Google Cloud services

In my application on Flask I use multiprocessing in a batch of files - user uploads a .zip with many pdf files -, after the upload, a new entity is created on the database for each file, then a thread is started and call a multiprocessing pool so each file starts a process which has interactions to Google Cloud services such as Google Storage and Google Datastore.

import threading
import multiprocessing
import sys

class ProcessMulti(threading.Thread):
    def __init__(self, files_ids):
        self.files_ids = files_ids
        super().__init__()

    def run(self):
        with multiprocessing.Pool(processes=multiprocessing.cpu_count()) as pool:
            for i, _ in enumerate(pool.imap_unordered(process_one, self.files_ids), 1):
                sys.stderr.write('\rdone {0:%}'.format(i/len(self.files_ids)))

def process_one(file_id):

    print("Process started by {}".format(file_id))
    file = File(file_id)
    file.process()
    print("Process finished by {}".format(file_id))

    return file.id

Inside File object, there are trivial interactions with Google Datastore and Google Storage - for example reding files from bucket or modifying data. Everything works smoothly locally... but in production using SSL connection, when trying to start the process, the following error is thrown and nothing happens at all:

Process started by 5377634535997440
E1004 15:49:32.711329522   32255 ssl_transport_security.cc:476] Corruption detected.
E1004 15:49:32.711356181   32255 ssl_transport_security.cc:452] error:100003fc:SSL routines:OPENSSL_internal:SSLV3_ALERT_BAD_RECORD_MAC
E1004 15:49:32.711361146   32255 secure_endpoint.cc:208]     Decryption error: TSI_DATA_CORRUPTED

Anyone has a clue to what's causing this error? I did some research and found some errors related to overload the SSL socket... but I have no idea which actions to fix that or alternatives to multiprocessing with similar performance. Thank you.

like image 674
Kenny Aires Avatar asked Jun 28 '26 15:06

Kenny Aires


2 Answers

An alternative solution that worked for us was to set GRPC_POLL_STRATEGY to 'poll' in the parent process:

os.environ['GRPC_POLL_STRATEGY']='poll'

We were getting the Decryption error: TSI_DATA_CORRUPTED error while using multi-threading with Firebase.

Source: https://github.com/grpc/grpc/issues/28557

like image 94
Daniel Danciu Avatar answered Jun 30 '26 09:06

Daniel Danciu


I ended up exchanging mutiprocessing and threading operations to celery task queues as there were some concerns regarding thread safety when connecting to gcloud services that I couldn't overcome. Celery implementation has been a good solution for many multiple async tasks on my app.

#Import celery instance with app context already set
from main_app import celery

@celery.task
def process_one(file_id):

    print("Process started by {}".format(file_id))
    file = File(file_id)
    file.process()
    print("Process finished by {}".format(file_id))

    return file.id
like image 43
Kenny Aires Avatar answered Jun 30 '26 11:06

Kenny Aires



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!