Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cloud Natural Language API returning socket.gaierror: nodename nor servname provided after performing Sentiment Analysis every now and then

I'm running the code on Jupyter notebook, I modified the code from this link so it takes it from Jupyter notebook instead of console and iterates over a list of files.

"""Demonstrates how to make a simple call to the Natural Language API."""

import argparse
import requests
from google.cloud import language
from google.cloud.language import enums
from google.cloud.language import types


def print_result(annotations, movie_review_filename):


    score = annotations.document_sentiment.score
    magnitude = annotations.document_sentiment.magnitude


    file_path_split = movie_review_filename.split("/")
    fileName = file_path_split[len(file_path_split) - 1][:-4]

    sentencelist = []  
    statuslist = []

    for index, sentence in enumerate(annotations.sentences):
        sentence_sentiment = sentence.sentiment.score
        singlesentence = [fileName, sentence.text.content, sentence.sentiment.magnitude, sentence_sentiment]
        sentencelist.append(singlesentence)


    outputdf = pd.DataFrame(sentencelist, columns = ['status_id', 'sentence', 'sentence_magnitude', 'sentence_sentiment'])        

    outputdf.to_csv("/Users/abhi/Desktop/RetrySentenceCSVs/" + fileName + ".csv", index = False)

    return 0


def analyze(movie_review_filename):
    """Run a sentiment analysis request on text within a passed filename."""
    client = language.LanguageServiceClient()

    with open(movie_review_filename, 'r') as review_file:
        # Instantiates a plain text document.
        content = review_file.read()

    document = types.Document(
        content=content,
        type=enums.Document.Type.PLAIN_TEXT)
    annotations = client.analyze_sentiment(document=document)

    # Print the results
    print_result(annotations, movie_review_filename)


if __name__ == '__main__':

    import glob
    csv_file_list = glob.glob("/Users/abhi/Desktop/mytxtfilepath/*.txt")
    for file in csv_file_list: #Iterate through a list of file paths

        analyze(file)

The code is running fine for 10% of the set of text files (I have 687), but after a while it starts to throw errors:

ERROR:root:AuthMetadataPluginCallback "<google.auth.transport.grpc.AuthMetadataPlugin object at 0x113b76588>" raised exception!
Traceback (most recent call last):
  File "/anaconda3/lib/python3.6/site-packages/urllib3/connection.py", line 171, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw)
  File "/anaconda3/lib/python3.6/site-packages/urllib3/util/connection.py", line 56, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "/anaconda3/lib/python3.6/socket.py", line 745, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 8] nodename nor servname provided, or not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 343, in _make_request
    self._validate_conn(conn)
  File "/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 849, in _validate_conn
    conn.connect()
  File "/anaconda3/lib/python3.6/site-packages/urllib3/connection.py", line 314, in connect
    conn = self._new_conn()
  File "/anaconda3/lib/python3.6/site-packages/urllib3/connection.py", line 180, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x113b840b8>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/anaconda3/lib/python3.6/site-packages/requests/adapters.py", line 445, in send
    timeout=timeout
  File "/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 638, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/anaconda3/lib/python3.6/site-packages/urllib3/util/retry.py", line 398, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='accounts.google.com', port=443): Max retries exceeded with url: /o/oauth2/token (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x113b840b8>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/anaconda3/lib/python3.6/site-packages/google/auth/transport/requests.py", line 120, in __call__
    **kwargs)
  File "/anaconda3/lib/python3.6/site-packages/requests/sessions.py", line 512, in request
    resp = self.send(prep, **send_kwargs)
  File "/anaconda3/lib/python3.6/site-packages/requests/sessions.py", line 622, in send
    r = adapter.send(request, **kwargs)
  File "/anaconda3/lib/python3.6/site-packages/requests/adapters.py", line 513, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='accounts.google.com', port=443): Max retries exceeded with url: /o/oauth2/token (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x113b840b8>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known',))

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/anaconda3/lib/python3.6/site-packages/grpc/_plugin_wrapping.py", line 77, in __call__
    callback_state, callback))
  File "/anaconda3/lib/python3.6/site-packages/google/auth/transport/grpc.py", line 77, in __call__
    callback(self._get_authorization_headers(context), None)
  File "/anaconda3/lib/python3.6/site-packages/google/auth/transport/grpc.py", line 65, in _get_authorization_headers
    headers)
  File "/anaconda3/lib/python3.6/site-packages/google/auth/credentials.py", line 122, in before_request
    self.refresh(request)
  File "/anaconda3/lib/python3.6/site-packages/google/oauth2/service_account.py", line 322, in refresh
    request, self._token_uri, assertion)
  File "/anaconda3/lib/python3.6/site-packages/google/oauth2/_client.py", line 145, in jwt_grant
    response_data = _token_endpoint_request(request, token_uri, body)
  File "/anaconda3/lib/python3.6/site-packages/google/oauth2/_client.py", line 106, in _token_endpoint_request
    method='POST', url=token_uri, headers=headers, body=body)
  File "/anaconda3/lib/python3.6/site-packages/google/auth/transport/requests.py", line 124, in __call__
    six.raise_from(new_exc, caught_exc)
  File "<string>", line 3, in raise_from
google.auth.exceptions.TransportError: HTTPSConnectionPool(host='accounts.google.com', port=443): Max retries exceeded with url: /o/oauth2/token (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x113b840b8>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known',))
ERROR:root:AuthMetadataPluginCallback "<google.auth.transport.grpc.AuthMetadataPlugin object at 0x113b76588>" raised exception!
Traceback (most recent call last):
  File "/anaconda3/lib/python3.6/site-packages/urllib3/connection.py", line 171, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw)
  File "/anaconda3/lib/python3.6/site-packages/urllib3/util/connection.py", line 56, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "/anaconda3/lib/python3.6/socket.py", line 745, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 8] nodename nor servname provided, or not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 343, in _make_request
    self._validate_conn(conn)
  File "/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 849, in _validate_conn
    conn.connect()
  File "/anaconda3/lib/python3.6/site-packages/urllib3/connection.py", line 314, in connect
    conn = self._new_conn()
  File "/anaconda3/lib/python3.6/site-packages/urllib3/connection.py", line 180, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x113b84470>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known

During handling of the above exception, another exception occurred:
...

The error repeats itself, then runs the SentimentAnalysis on the file, and then shows up against multiple times, then runs the SentimentAnalysis on the file and then finally stops with RendezVous error (forgot to capture this message) What I'm wondering is, how is it that the code worked for certain set of files for some time and threw error messages, worked a little more, threw error messages and then completely stopped working after a point?

I reran the code, only to find that, it is returning socket.gaierror after some random number of files in a folder. So one can see with a reasonable level of confidence that it is not the file contents that is the issue.

EDIT1: The file is simply any .txt files that have words in it. Can someone help me resolve this? I can also assure you, all the text I have in all the 680 files accounts for a total of 1400 requests, I've been very meticulous in its calculation based on the definition of what a request is according to Cloud Natural API. so I am WELL within my limits.

EDIT2: I've tried sleep(10) which seems to work fine for a while but again begins throwing errors..

like image 242
imperialgendarme Avatar asked Aug 09 '18 20:08

imperialgendarme


1 Answers

I figured it out. You're going to have to not read in all 600 files at once, but instead try to read it in batches of 50 files. (Create 12 folders with 50 files each), and manually run the code every time it is done scanning the folder. I'm not sure WHY this seems to work out, but it just works.

like image 85
imperialgendarme Avatar answered Oct 26 '22 21:10

imperialgendarme