I've been trying to consume the Twitter Streaming API using Python Requests.
There's a simple example in the documentation:
import requests
import json
r = requests.post('https://stream.twitter.com/1/statuses/filter.json',
data={'track': 'requests'}, auth=('username', 'password'))
for line in r.iter_lines():
if line: # filter out keep-alive new lines
print json.loads(line)
When I execute this, the call to requests.post()
never returns. I've experimented and proved that it is definitely connecting to Twitter and receiving data from the API. However, instead of returning a response object, it just sits there consuming as much data as Twitter sends. Judging by the code above, I would expect requests.post()
to return a response object with an open connection to Twitter down which I could continue to receive realtime results.
(To prove it was receiving data, I connected to Twitter using the same credentials in another shell, whereupon Twitter closed the first connection, and the call returned the response object. The r.content
attribute contained all the backed up data received while the connection was open.)
The documentation makes no mention of any other steps required to cause requests.post
to return before consuming all the supplied data. Other people seem to be using similar code without encountering this problem, e.g. here.
I'm using:
And when stream=True on the request, this method will avoid reading the whole file into memory at once for just the large responses. Do note that the chunk_size parameter can be either an integer or None.
HTTP Streaming is a push-style data transfer technique that allows a web server to continuously send data to a client over a single HTTP connection that remains open indefinitely.
You need to switch off prefetching, which I think is a parameter that changed defaults:
r = requests.post('https://stream.twitter.com/1/statuses/filter.json',
data={'track': 'requests'}, auth=('username', 'password'),
prefetch=False)
for line in r.iter_lines():
if line: # filter out keep-alive new lines
print json.loads(line)
Note that as of requests 1.x the parameter has been renamed, and now you use stream=True
:
r = requests.post('https://stream.twitter.com/1/statuses/filter.json',
data={'track': 'requests'}, auth=('username', 'password'),
stream=True)
for line in r.iter_lines():
if line: # filter out keep-alive new lines
print json.loads(line)
Ah, I found the answer by reading the code. At some point, a prefetch parameter was added to the post method (and other methods, I assume).
I just needed to add a prefetch=False
kwarg to requests.post()
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With