Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to halt, kill, stop or close a PycURL request on a stream example given using Twitter Stream

Im currently cURLing the twitter API stream (http://stream.twitter.com/1/statuses/sample.json), so am constantly receiving data. I wish to stop cURLing the stream once i have retrieved X number of objects from it (in the example I give 10 as an arbitrary number).

You can see how I have attempted to close the connection in the code below. The code below curling.perform() never executes, due to the fact that it is a continuous stream of data. So I attempted to close the stream in the body_callback, however because perform() is currently running i can not invoke close().

Any help would be appreciated.

Code:

# Imports
import pycurl # Used for doing cURL request
import base64 # Used to encode username and API Key
import json # Used to break down the json objects

# Settings to access stream and API
userName = 'twitter_username' # My username
password = 'twitter_password' # My API Key
apiURL = 'http://stream.twitter.com/1/statuses/sample.json' # the twitter api
tweets = [] # An array of Tweets

# Methods to do with the tweets array
def how_many_tweets():
    print 'Collected: ',len(tweets)
    return len(tweets)

class Tweet:
    def __init__(self):
        self.raw = ''
        self.id = ''
        self.content = ''

    def decode_json(self):
        return True

    def set_id(self):
        return True

    def set_content(self):
        return True

    def set_raw(self, data):
        self.raw = data

# Class to print out the stream as it comes from the API
class Stream:
    def __init__(self):
        self.tweetBeingRead =''

    def body_callback(self, buf):
        # This gets whole Tweets, and adds them to an array called tweets
        if(buf.startswith('{"in_reply_to_status_id_str"')): # This is the start of a tweet
            # Added Tweet to Global Array Tweets
            print 'Added:' # Priniting output to console
            print self.tweetBeingRead # Printing output to console
            theTweetBeingProcessed = Tweet() # Create a new Tweet Object
            theTweetBeingProcessed.set_raw(self.tweetBeingRead) # Set its raw value to tweetBeingRead
            tweets.append(theTweetBeingProcessed) # Add it to the global array of tweets
            # Start processing a new tweet
            self.tweet = buf # Start a new tweet from scratch
        else:
            self.tweetBeingRead = self.tweetBeingRead+buf
        if(how_many_tweets()>10):
            try:
                curling.close() # This is where the problem lays. I want to close the stream
            except Exception as CurlError:
                print ' Tried closing stream: ',CurlError

# Used to initiate the cURLing of the Data Sift streams
datastream = Stream()
curling = pycurl.Curl()
curling.setopt(curling.URL, apiURL)
curling.setopt(curling.HTTPHEADER, ['Authorization: '+base64.b64encode(userName+":"+password)])
curling.setopt(curling.WRITEFUNCTION, datastream.body_callback)
curling.perform() # This is cURLing starts
print 'I cant reach here.'
curling.close() # This never gets called. :(
like image 879
jonhurlock Avatar asked Oct 12 '22 00:10

jonhurlock


1 Answers

You can abort the write callback by returning a number that isn't the same amount as was passed in to it. (By default it treats returning 'None' the same as returning the same number as was passed in to it)

When you abort it, the entire transfer will be considered done and your perform() call returns properly.

That transfer will then return an error as the transfer was aborted.

like image 56
Daniel Stenberg Avatar answered Oct 27 '22 11:10

Daniel Stenberg