I'm running the code on Jupyter notebook, I modified the code from this link so it takes it from Jupyter notebook instead of console and iterates over a list of files.
"""Demonstrates how to make a simple call to the Natural Language API."""
import argparse
import requests
from google.cloud import language
from google.cloud.language import enums
from google.cloud.language import types
def print_result(annotations, movie_review_filename):
score = annotations.document_sentiment.score
magnitude = annotations.document_sentiment.magnitude
file_path_split = movie_review_filename.split("/")
fileName = file_path_split[len(file_path_split) - 1][:-4]
sentencelist = []
statuslist = []
for index, sentence in enumerate(annotations.sentences):
sentence_sentiment = sentence.sentiment.score
singlesentence = [fileName, sentence.text.content, sentence.sentiment.magnitude, sentence_sentiment]
sentencelist.append(singlesentence)
outputdf = pd.DataFrame(sentencelist, columns = ['status_id', 'sentence', 'sentence_magnitude', 'sentence_sentiment'])
outputdf.to_csv("/Users/abhi/Desktop/RetrySentenceCSVs/" + fileName + ".csv", index = False)
return 0
def analyze(movie_review_filename):
"""Run a sentiment analysis request on text within a passed filename."""
client = language.LanguageServiceClient()
with open(movie_review_filename, 'r') as review_file:
# Instantiates a plain text document.
content = review_file.read()
document = types.Document(
content=content,
type=enums.Document.Type.PLAIN_TEXT)
annotations = client.analyze_sentiment(document=document)
# Print the results
print_result(annotations, movie_review_filename)
if __name__ == '__main__':
import glob
csv_file_list = glob.glob("/Users/abhi/Desktop/mytxtfilepath/*.txt")
for file in csv_file_list: #Iterate through a list of file paths
analyze(file)
The code is running fine for 10% of the set of text files (I have 687), but after a while it starts to throw errors:
ERROR:root:AuthMetadataPluginCallback "<google.auth.transport.grpc.AuthMetadataPlugin object at 0x113b76588>" raised exception!
Traceback (most recent call last):
File "/anaconda3/lib/python3.6/site-packages/urllib3/connection.py", line 171, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw)
File "/anaconda3/lib/python3.6/site-packages/urllib3/util/connection.py", line 56, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "/anaconda3/lib/python3.6/socket.py", line 745, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 8] nodename nor servname provided, or not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 343, in _make_request
self._validate_conn(conn)
File "/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 849, in _validate_conn
conn.connect()
File "/anaconda3/lib/python3.6/site-packages/urllib3/connection.py", line 314, in connect
conn = self._new_conn()
File "/anaconda3/lib/python3.6/site-packages/urllib3/connection.py", line 180, in _new_conn
self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x113b840b8>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/anaconda3/lib/python3.6/site-packages/requests/adapters.py", line 445, in send
timeout=timeout
File "/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "/anaconda3/lib/python3.6/site-packages/urllib3/util/retry.py", line 398, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='accounts.google.com', port=443): Max retries exceeded with url: /o/oauth2/token (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x113b840b8>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/anaconda3/lib/python3.6/site-packages/google/auth/transport/requests.py", line 120, in __call__
**kwargs)
File "/anaconda3/lib/python3.6/site-packages/requests/sessions.py", line 512, in request
resp = self.send(prep, **send_kwargs)
File "/anaconda3/lib/python3.6/site-packages/requests/sessions.py", line 622, in send
r = adapter.send(request, **kwargs)
File "/anaconda3/lib/python3.6/site-packages/requests/adapters.py", line 513, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='accounts.google.com', port=443): Max retries exceeded with url: /o/oauth2/token (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x113b840b8>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known',))
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/anaconda3/lib/python3.6/site-packages/grpc/_plugin_wrapping.py", line 77, in __call__
callback_state, callback))
File "/anaconda3/lib/python3.6/site-packages/google/auth/transport/grpc.py", line 77, in __call__
callback(self._get_authorization_headers(context), None)
File "/anaconda3/lib/python3.6/site-packages/google/auth/transport/grpc.py", line 65, in _get_authorization_headers
headers)
File "/anaconda3/lib/python3.6/site-packages/google/auth/credentials.py", line 122, in before_request
self.refresh(request)
File "/anaconda3/lib/python3.6/site-packages/google/oauth2/service_account.py", line 322, in refresh
request, self._token_uri, assertion)
File "/anaconda3/lib/python3.6/site-packages/google/oauth2/_client.py", line 145, in jwt_grant
response_data = _token_endpoint_request(request, token_uri, body)
File "/anaconda3/lib/python3.6/site-packages/google/oauth2/_client.py", line 106, in _token_endpoint_request
method='POST', url=token_uri, headers=headers, body=body)
File "/anaconda3/lib/python3.6/site-packages/google/auth/transport/requests.py", line 124, in __call__
six.raise_from(new_exc, caught_exc)
File "<string>", line 3, in raise_from
google.auth.exceptions.TransportError: HTTPSConnectionPool(host='accounts.google.com', port=443): Max retries exceeded with url: /o/oauth2/token (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x113b840b8>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known',))
ERROR:root:AuthMetadataPluginCallback "<google.auth.transport.grpc.AuthMetadataPlugin object at 0x113b76588>" raised exception!
Traceback (most recent call last):
File "/anaconda3/lib/python3.6/site-packages/urllib3/connection.py", line 171, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw)
File "/anaconda3/lib/python3.6/site-packages/urllib3/util/connection.py", line 56, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "/anaconda3/lib/python3.6/socket.py", line 745, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 8] nodename nor servname provided, or not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 343, in _make_request
self._validate_conn(conn)
File "/anaconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 849, in _validate_conn
conn.connect()
File "/anaconda3/lib/python3.6/site-packages/urllib3/connection.py", line 314, in connect
conn = self._new_conn()
File "/anaconda3/lib/python3.6/site-packages/urllib3/connection.py", line 180, in _new_conn
self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x113b84470>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known
During handling of the above exception, another exception occurred:
...
The error repeats itself, then runs the SentimentAnalysis on the file, and then shows up against multiple times, then runs the SentimentAnalysis on the file and then finally stops with RendezVous
error (forgot to capture this message) What I'm wondering is, how is it that the code worked for certain set of files for some time and threw error messages, worked a little more, threw error messages and then completely stopped working after a point?
I reran the code, only to find that, it is returning socket.gaierror after some random number of files in a folder. So one can see with a reasonable level of confidence that it is not the file contents that is the issue.
EDIT1: The file is simply any .txt
files that have words in it.
Can someone help me resolve this? I can also assure you, all the text I have in all the 680 files accounts for a total of 1400 requests, I've been very meticulous in its calculation based on the definition of what a request is according to Cloud Natural API. so I am WELL within my limits.
EDIT2: I've tried sleep(10)
which seems to work fine for a while but again begins throwing errors..
I figured it out. You're going to have to not read in all 600 files at once, but instead try to read it in batches of 50 files. (Create 12 folders with 50 files each), and manually run the code every time it is done scanning the folder. I'm not sure WHY this seems to work out, but it just works.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With