Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google translate api timeout

I have approximately 20000 pieces of texts to translate, each of which average around the length of 100 characters. I am using the multiprocessing library to speed up my API calls. And looks like below:

from google.cloud.translate_v2 import Client
from time import sleep
from tqdm.notebook import tqdm
import multiprocessing as mp

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = cred_file
translate_client = Client()

def trans(text, MAX_TRIES=5):
    res = None
    sleep_time = 1
    for i in range(MAX_TRIES):
        try:
            res = translate_client.translate(text, target_language="en", model="nmt")
            error = None
        except Exception as error:
            pass

        if res is None:
            sleep(sleep_time)  # wait for 1 seconds before trying to fetch the data again
            sleep_time *= 2
        else:
            break

    return res["translatedText"]

src_text = # eg. ["this is a sentence"]*20000
with mp.Pool(mp.cpu_count()) as pool:
    translated = list(tqdm(pool.imap(trans, src_text), total=len(src_text)))

The above code unfortunately fails around iteration 2828 +/- 5 every single time (HTTP Error 503: Service Unavailable). I was hoping that having a variable sleep time would let it restart and run as normal. Weird thing is that if I was to restart the loop straight away, it starts again without issue, even though < 2^4 seconds have passed since the code finished execution. So the questions are:

  1. Am I doing the try/except bit wrong?
  2. Is doing the multiprocessing somehow affecting the API.
  3. General thoughts?

I need the multiprocessing because otherwise I would be waiting for around 3 hours for the whole thing to finish.

like image 490
sachinruk Avatar asked Jun 26 '20 11:06

sachinruk


1 Answers

Some thoughts, the google APIs tried before, can only handle a certain number of concurrent requests, and if the limit is reached, the service will return the error HTTP 503 "Service Unavailable." And HTTP 403 if the Daily limit is Exceeded or User Rate Limit.

Try to implement retries with exponential backoff. Retry an operation with an exponentially increasing waiting time, up to a max retry count has been reached. It will improve the bandwidth usage and maximize throughput of requests in concurrent environments.

And review the Quotas and Limits page.

  • Exponential backoff
like image 160
a_e Avatar answered Oct 18 '22 09:10

a_e