Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SMS messages non ASCII characters encoding

I have a Nokia N900 phone, and when sending an SMS, the widget displays the number of characters left in the message (and the number of actual short messages needed to send the whole message).

I live in France, where I noticed the following odd thing when writing messages with non-ASCII characters:

  • some non-ASCII chars are encoded on one char/byte , e.g. "é", "è", "à", "ù"
  • the presence of some non-ASCII chars such as "ç", "ê", "ô" consumes a fixed amount of 90 char/bytes + 1 byte per character
  • the presence of a second "ç", "ê" etc. only consumes 1 additional byte.

So I'm wondering how the messages are encoded, because I can't see the above scheme matching the traditional encodings I know (iso-8859-1, UTF-8, UTF-16...).

like image 988
gurney alex Avatar asked Aug 18 '11 07:08

gurney alex


2 Answers

https://en.wikipedia.org/wiki/SMS#Message_size

Depend on the encoding, SMS can send 160/140/70 characters. If any of the non-ASCII chars are used, the entire message would have to be encoded in UTF-16, hence the "consumption" you experienced.

like image 88
timdream Avatar answered Sep 28 '22 23:09

timdream


@Vicky and @timdream are right, except that I believe it's technically UCS-2 and not UTF-16 that the phone sometimes uses, which has a fixed 16-bit size per character. UTF-16 uses a variable width of two or four bytes per character, depending on the character being encoded. This Wikipedia article explains this in detail. UCS-2 strictly takes the message down to 70 characters at most (160 bytes). Although the Unicode Consortium's description of UCS-2 is a bit confusing, a handful of sites around the web dealing with SMS confirm that Wikipedia is right.

like image 39
hotshot309 Avatar answered Sep 29 '22 00:09

hotshot309