Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

EventMachine and Twitter streaming API

I am running an EventMachine process using the Twitter streaming API. I always have an issue if the content of the stream is not frequently.

Here is the minimal version of the script:

require 'rubygems'
require 'eventmachine'
require 'em-http'
require 'json'

usage = "#{$0} <user> <password> <track>"
abort usage unless user = ARGV.shift
abort usage unless password = ARGV.shift
abort usage unless keywords= ARGV.shift

def startIt(user,password,keywords)
EventMachine.run do
  http = EventMachine::HttpRequest.new("https://stream.twitter.com/1/statuses/filter.json",{:port=>443}).post(
                    :head =>{ 'Authorization' => [ user, password ] } , 
                    :body =>{"track"=>keywords},
                    :keepalive=>true,
                    :timeout=>-1)

  buffer = ""
  http.stream do |chunk|
    buffer += chunk
    while line = buffer.slice!(/.+\r?\n/)
      if line.length>5
          tweet=JSON.parse(line)
          puts Time.new.to_s+"#{tweet['user']['screen_name']}: #{tweet['text']}"
      end
    end

  end
   http.errback {
        puts Time.new.to_s+"Error: "
        puts http.error
   }
end  
    rescue => error
      puts "error rescue "+error.to_s
end

while true
    startIt user,password,keywords
end

If I search for a keyword like "iphone", everything works well If I search for a less frequently used keyword, my stream keeps to be closed very rapidely , around 20 sec after the last message. Note: that http.error is always empty, so it's very hard to understand while the stream is closed... On the other end, the nerly similar php version is not closed, so seems probably in issue with eventmachine/http-em but I dont' understand which one...

like image 443
tomsoft Avatar asked Jan 22 '12 15:01

tomsoft


People also ask

What is the difference between Twitter search API and streaming API?

Unlike Twitter's Search API where you are polling data from tweets that have already happened, Twitter's Streaming API is a push of data as tweets happen in near real-time. With Twitter's Streaming API, users register a set of criteria (keywords, usernames, locations, named places, etc.)

Is data collection through Twitter streaming API useful for academic research?

Our experiments showed that when filtering is used for terms that are not very popular, then all the matching Tweets are likely provided by Twitter; in this case, analyzing those Tweets will provide reliable results for research purposes.

How do I stream Twitter in python?

Now you need an access token, so scroll down and click on "create my access token." After a few moments, refresh, and you should be able to see the access key and access token. Once you have that, you're going to need to get Tweepy, which is a Python module for streaming Twitter tweets.


1 Answers

You should add settings to prevent your connection to timeout. Try this :

http = EventMachine::HttpRequest.new(
  "https://stream.twitter.com/1/statuses/filter.json",
  :connection_timeout => 0,
  :inactivity_timeout => 0
).post(
  :head => {'Authorization' => [ user, password ] } , 
  :body => {'track' => keywords}
)

Good luck, Christian

like image 81
Chris Avatar answered Oct 30 '22 22:10

Chris