I have a python script that continuously stores tweets related to tracked keywords to a file. However, the script tends to crash repeatedly due to an error appended below. How do I edit the script so that it automatically restarts? I've seen numerous solutions including this (Restarting a program after exception) but I'm not sure how to implement it in my script.
import sys
import tweepy
import json
import os
consumer_key=""
consumer_secret=""
access_key = ""
access_secret = ""
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
# directory that you want to save the json file
os.chdir("C:\Users\json_files")
# name of json file you want to create/open and append json to
save_file = open("12may.json", 'a')
class CustomStreamListener(tweepy.StreamListener):
def __init__(self, api):
self.api = api
super(tweepy.StreamListener, self).__init__()
# self.list_of_tweets = []
def on_data(self, tweet):
print tweet
save_file.write(str(tweet))
def on_error(self, status_code):
print >> sys.stderr, 'Encountered error with status code:', status_code
return True # Don't kill the stream
print "Stream restarted"
def on_timeout(self):
print >> sys.stderr, 'Timeout...'
return True # Don't kill the stream
print "Stream restarted"
sapi = tweepy.streaming.Stream(auth, CustomStreamListener(api))
sapi.filter(track=["test"])
===========================================================================
Traceback (most recent call last):
File "C:\Users\tweets_to_json.py", line 41, in <module>
sapi.filter(track=["test"])
File "C:\Python27\lib\site-packages\tweepy-2.3-py2.7.egg\tweepy\streaming.py", line 316, in filter
self._start(async)
File "C:\Python27\lib\site-packages\tweepy-2.3-py2.7.egg\tweepy\streaming.py", line 235, in _start
self._run()
File "C:\Python27\lib\site-packages\tweepy-2.3-py2.7.egg\tweepy\streaming.py", line 165, in _run
self._read_loop(resp)
File "C:\Python27\lib\site-packages\tweepy-2.3-py2.7.egg\tweepy\streaming.py", line 206, in _read_loop
for c in resp.iter_content():
File "C:\Python27\lib\site-packages\requests-1.2.3-py2.7.egg\requests\models.py", line 541, in generate
chunk = self.raw.read(chunk_size, decode_content=True)
File "C:\Python27\lib\site-packages\requests-1.2.3-py2.7.egg\requests\packages\urllib3\response.py", line 171, in read
data = self._fp.read(amt)
File "C:\Python27\lib\httplib.py", line 543, in read
return self._read_chunked(amt)
File "C:\Python27\lib\httplib.py", line 603, in _read_chunked
value.append(self._safe_read(amt))
File "C:\Python27\lib\httplib.py", line 660, in _safe_read
raise IncompleteRead(''.join(s), amt)
IncompleteRead: IncompleteRead(0 bytes read, 1 more expected)
How do I restart Python shell? To execute a file in IDLE, simply press the F5 key on your keyboard. You can also select Run → Run Module from the menu bar. Either option will restart the Python interpreter and then run the code that you've written with a fresh interpreter.
If you need more than 100 Tweets, you have to use the paginator method and specify the limit i.e. the total number of Tweets that you want. Replace limit=1000 with the maximum number of tweets you want. Replace the limit=1000 with the maximum number of tweets you want (gist).
But keep in mind that Twitter levies a rate limit on the number of requests made to the Twitter API. To be precise, 900 requests/15 minutes are allowed; Twitter feeds anything above that an error.
Streaming with Tweepy comprises of three objects; Stream, StreamListener, OAuthHandler. The latter simply handles API authentication and requires the unique keys from the creation of your Twitter app. The StreamListener class is used to define how each incoming tweet should be handled.
Figured out how to incorporate the while/try loop by writing a new function for the stream:
def start_stream():
while True:
try:
sapi = tweepy.streaming.Stream(auth, CustomStreamListener(api))
sapi.filter(track=["Samsung", "s4", "s5", "note" "3", "HTC", "Sony", "Xperia", "Blackberry", "q5", "q10", "z10", "Nokia", "Lumia", "Nexus", "LG", "Huawei", "Motorola"])
except:
continue
start_stream()
I tested the auto restart by manually interrupting the program with CMD + C. Nonetheless, happy to hear of better ways to test such functionality.
I had this problem occurring recently and wanted to share more detailed information about it.
The error that's causing it is because the streaming filter that's chosen is too broad test
. Therefore you receive streams at a faster rate than you can accept which causes an IncompleRead
error.
This can be fixed by either refining the search or by using a more specific exception:
from http.client import IncompleteRead
...
try:
sapi = tweepy.streaming.Stream(auth, CustomStreamListener(api))
sapi.filter(track=["test"])
except IncompleRead:
pass
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With