I am running an EventMachine process using the Twitter streaming API. I always have an issue if the content of the stream is not frequently.
Here is the minimal version of the script:
require 'rubygems'
require 'eventmachine'
require 'em-http'
require 'json'
usage = "#{$0} <user> <password> <track>"
abort usage unless user = ARGV.shift
abort usage unless password = ARGV.shift
abort usage unless keywords= ARGV.shift
def startIt(user,password,keywords)
EventMachine.run do
http = EventMachine::HttpRequest.new("https://stream.twitter.com/1/statuses/filter.json",{:port=>443}).post(
:head =>{ 'Authorization' => [ user, password ] } ,
:body =>{"track"=>keywords},
:keepalive=>true,
:timeout=>-1)
buffer = ""
http.stream do |chunk|
buffer += chunk
while line = buffer.slice!(/.+\r?\n/)
if line.length>5
tweet=JSON.parse(line)
puts Time.new.to_s+"#{tweet['user']['screen_name']}: #{tweet['text']}"
end
end
end
http.errback {
puts Time.new.to_s+"Error: "
puts http.error
}
end
rescue => error
puts "error rescue "+error.to_s
end
while true
startIt user,password,keywords
end
If I search for a keyword like "iphone", everything works well If I search for a less frequently used keyword, my stream keeps to be closed very rapidely , around 20 sec after the last message. Note: that http.error is always empty, so it's very hard to understand while the stream is closed... On the other end, the nerly similar php version is not closed, so seems probably in issue with eventmachine/http-em but I dont' understand which one...
Unlike Twitter's Search API where you are polling data from tweets that have already happened, Twitter's Streaming API is a push of data as tweets happen in near real-time. With Twitter's Streaming API, users register a set of criteria (keywords, usernames, locations, named places, etc.)
Our experiments showed that when filtering is used for terms that are not very popular, then all the matching Tweets are likely provided by Twitter; in this case, analyzing those Tweets will provide reliable results for research purposes.
Now you need an access token, so scroll down and click on "create my access token." After a few moments, refresh, and you should be able to see the access key and access token. Once you have that, you're going to need to get Tweepy, which is a Python module for streaming Twitter tweets.
You should add settings to prevent your connection to timeout. Try this :
http = EventMachine::HttpRequest.new(
"https://stream.twitter.com/1/statuses/filter.json",
:connection_timeout => 0,
:inactivity_timeout => 0
).post(
:head => {'Authorization' => [ user, password ] } ,
:body => {'track' => keywords}
)
Good luck, Christian
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With