Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Official encoding used by Twitter Streaming API? Is it UTF-8?

What is the official encoding for Twitter's streaming API? My best guess is UTF-8 based on what I've seen, but I would like to avoid making assumptions.

The only part of the Twitter site I've seen where they even hint at what they use as their official encoding is here:

Twitter does not want to penalize a user for the fact we use UTF-8 or for the fact that the API client in question used the longer representation

https://dev.twitter.com/docs/counting-characters

Does anyone have a more "official" answer? I'm writing a state-machine tokenizer for the streaming API which makes certain assumptions. The last thing I want is to encounter something like UTF-16.

Thanks! :D

like image 911
IHeartDuckies Avatar asked Nov 25 '11 23:11

IHeartDuckies


People also ask

Does twitter use UTF-8?

Twitter Character EncodingAll Twitter attributes accept UTF-8 encoded text via the API.

What is the Twitter streaming API?

The Twitter API allows you to stream public Tweets from the platform in real-time so that you can display them and basic metrics about them.

Is UTF-8 the default encoding?

Browsers will typically use the value of the XML encoding declaration, or default to UTF-8 if there is none. Second, if there is a UTF-8 BOM on the document, and the XML encoding declaration is either UTF-8 or not included, the document will be interpreted as UTF-8, regardless of the charset used in the Content-Type.

What is the difference between Twitter search API and streaming API?

Unlike Twitter's Search API where you are polling data from tweets that have already happened, Twitter's Streaming API is a push of data as tweets happen in near real-time. With Twitter's Streaming API, users register a set of criteria (keywords, usernames, locations, named places, etc.)


1 Answers

One indicator is that the JSON format, which Twitter uses for virtually everything, dictates (or at least defaults to) UTF-8. They should also set an appropriate HTTP header denoting the encoding (but I haven't confirmed this). If you're using XML instead, the XML opening tag explicitly denotes the encoding, which is UTF-8.

like image 129
deceze Avatar answered Oct 14 '22 23:10

deceze