Error while fetching Tweets with Tweepy

Tags:

I have a Python script that fetch tweets. In the script i use the libary Tweepy . I use a valid authentication parameters. After running this script some tweets are stored in my MongoDB and some are refused by the if statement. But still i get the error

requests.packages.urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(0 bytes read, 2457 more expected)'

My question is which part of the script can i improve, so i do not get the error above.

This is my script

    from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import time
import json
from pymongo import MongoClient

#Mongo Settings
client = MongoClient()
db = client.Sentiment
Tweets = db.Tweet

#Twitter Credentials
ckey ='myckey'
csecret ='mycsecret'
atoken = 'myatoken'
asecret = 'myasecret'

class listener(StreamListener):

    def on_data(self, data):
        try:  

            tweet = json.loads(data)

            if tweet["lang"] == "nl":
                print tweet["id"]
                Tweets.insert(tweet)



            return True
        except BaseException, e:
            print 'failed on_date,', str(e)
            time.sleep(5)

    def on_error(self, status):
        print status

auth = OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)
twitterStream = Stream(auth, listener())
twitterStream.filter( track=["geld lenen"
                            ,"lening"
                            ,"Defam"
                            ,"DEFAM"
                            ,"Credivance"
                            ,"CREDIVANCE"
                            ,"Alpha Credit"
                            ,"ALPHA CREDIT"
                            ,"Advanced Finance"
                            ,"krediet"
                            ,"KREDIET"
                            ,"private lease"
                            ,"ing"
                            ,"Rabobank"
                            ,"Interbank"
                            ,"Nationale Nerderlanden"
                            ,"Geldshop"
                            ,"Geldlenen"
                            ,"ABN AMBRO"
                            ,"Independer"
                            ,"DGA adviseur"
                            ,"VDZ"
                            ,"vdz"
                            ,"Financieel Attent"
                            ,"Anderslenen"
                            ,"De Nederlandse Kredietmaatschappij"
                            ,"Moneycare"
                            ,"De Financiele Makelaar Kredieten"
                            ,"Finanplaza"
                            ,"Krediet"
                            ,"CFSN Kredietendesk"
                            ,"De Graaf Assurantien en Financieel Adviseurs"
                            ,"AMBTENARENLENING"
                            ,"VDZ Geldzaken"
                            ,"Financium Primae"
                            ,"SNS"
                            ,"AlfamConsumerCredit"
                            ,"GreenLoans"
                            ], languages="nl" 
                     )

I hope you can help me...

794

asked Feb 25 '15 11:02

Erik hoeven

1 Answers

This IncompleteRead error generally tends to occur when your consumption of incoming tweets starts to fall behind, which makes sense in your case given your long list of terms to track. The general approach most people seem to be taking (myself included) is simply to suppress this error and continue your collection (see the link above).

I can't completely remember if IncompleteRead will close your connection (I think it might, because my personal solution reconnects my stream), but you may consider something like the following (I'm just going to wing it, it probably needs reworking for your situation):

# from httplib import IncompleteRead # Python 2
from http.client import IncompleteRead # Python 3
...
while True:
    try:
        # Connect/reconnect the stream
        stream = Stream(auth, listener)
        # DON'T run this approach async or you'll just create a ton of streams!
        stream.filter(terms)
    except IncompleteRead:
        # Oh well, reconnect and keep trucking
        continue
    except KeyboardInterrupt:
        # Or however you want to exit this loop
        stream.disconnect()
        break
...

Again, I'm just winging it there, but the moral of the story is that the general approach taken here is to suppress the error and continue.

EDIT (10/11/2016): Just a useful tidbit for anyone dealing with very large volumes of tweets - one way to handle this case without losing connection time or tweets would be to drop your incoming tweets into a queuing solution (RabbitMQ, Kafka, etc.) to be ingested/processed by an application reading from that queue.

This moves the bottleneck from the Twitter API to your queue, which should have no problem waiting for you to consume the data.

This is more of a "production" software solution, so if you don't care about losing tweets or reconnecting, the above solution is still perfectly valid.

135

answered Sep 25 '22 01:09

dbernard

Related questions
                            
                                multi-threading in python: is it really performance effiicient most of the time?
                            
                                Write python dictionary to CSV columns: keys to first column, values to second
                            
                                How do I set an Argparse argument's default value to a positional argument's value?
                            
                                When importing a function it runs the whole script?
                            
                                How to iterate over a Priority Queue?
                            
                                "TypeError: got multiple values for argument" after applying functools.partial() [duplicate]
                            
                                Get the indices of N highest values in an ndarray
                            
                                Query multiple values at a time pymongo
                            
                                google spreadsheets gspread append_row issue
                            
                                Bootstrap Carousel Implementation in Django
                            
                                python xlsxwriter change all cell widths when using write_row
                            
                                What is the difference between super() being called at the beginning or end of a method?
                            
                                django dynamic related name on FK model inhertiance
                            
                                Python check exit status of a shell command
                            
                                In django, how can I filter or exclude multiple things?
                            
                                Track value changes in a repetitive list in Python
                            
                                fitting a circle to a binary image
                            
                                Precision of repr(f), str(f), print(f) when f is float
                            
                                Drop rows if value in a specific column is not an integer in pandas dataframe
                            
                                When I am importing `http.server` from the idle it works, but when I run a python file having `import http.server` there is an error

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Error while fetching Tweets with Tweepy

Tags:

python

mongodb

twitter

tweepy

Erik hoeven

People also ask

1 Answers

dbernard

Recent Activity

Donate For Us