Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

'Reliable' SMS Unicode & GSM Encoding in PHP

Tags:

(Updated a little)

I'm not very experienced with internationalization using PHP, it must be said, and a deal of searching didn't really provide the answers I was looking for.

I'm in need of working out a reliable way to convert only 'relevant' text to Unicode to send in an SMS message, using PHP (just temporarily, whilst service is rewritten using C#) - obviously, messages sent at the moment are sent as plain text.

I could conceivably convert everything to the Unicode charset (as opposed to using the standard GSM charset), but that would mean that all messages would be limited to 70 characters (instead of 160).

So, I guess my real question is: what is the most reliable way to detect the requirement for a message to be Unicode-encoded, so I only have to do it when it's absolutely necessary (e.g. for non-Latin-language characters)?

Added Info:

Okay, so I've spent the morning working on this, and I'm still no further on than when I started (certainly due to my complete lack of competency when it comes to charset conversion). So here's the revised scenario:

I have text SMS messages coming from an external source, this external source provides the responses to me in plain text + Unicode slash-escaped characters. E.g. the 'displayed' text:

Let's test öäü éàè אין תמיכה בעברית

Returns:

Let's test \u00f6\u00e4\u00fc \u00e9\u00e0\u00e8 \u05d0\u05d9\u05df \u05ea\u05de\u05d9\u05db\u05d4 \u05d1\u05e2\u05d1\u05e8\u05d9\u05ea

Now, I can send on to my SMS provider in plaintext, GSM 03.38 or Unicode. Obviously, sending the above as plaintext results in a lot of missing characters (they're replaced by spaces by my provider) - I need to adopt relating to what content there is. What I want to do with this is the following:

  1. If all text is within the GSM 03.38 codepage, send it as-is. (All but the Hebrew characters above fit into this category, but need to be converted.)

  2. Otherwise, convert it to Unicode, and send it over multiple messages (as the Unicode limit is 70 chars not 160 for an SMS).

As I said above, I'm stumped on doing this in PHP (C# wasn't much of an issue due to some simple conversion functions built-in), but it's quite probable I'm just missing the obvious, here. I couldn't find any pre-made conversion classes for 7-bit encoding in PHP, either - and my attempts to convert the string myself and send it on seemed futile.

Any help would be greatly appreciated.