Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to UTF-8 encode a character/string

I am using a Twitter API library to post a status to Twitter. Twitter requires that the post be UTF-8 encoded. The library contains a function that URL encodes a standard string, which works perfectly for all special characters such as !@#$%^&*() but is the incorrect encoding for accented characters (and other UTF-8).

For example, 'é' gets converted to '%E9' rather than '%C3%A9' (it pretty much only converts to a hexadecimal value). Is there a built-in function that could input something like 'é' and return something like '%C9%A9"?

edit: I am fairly new to UTF-8 in case what I am requesting makes no sense.

edit: if I have a

string foo = "bar é";

I would like to convert it to

"bar %C3%A9"

Thanks

like image 861
tom Avatar asked Feb 22 '11 19:02

tom


People also ask

How do I encode a String in UTF-8?

In order to convert a String into UTF-8, we use the getBytes() method in Java. The getBytes() method encodes a String into a sequence of bytes and returns a byte array. where charsetName is the specific charset by which the String is encoded into an array of bytes.

Can UTF-8 encode all characters?

Defined by the Unicode Standard, the name is derived from Unicode (or Universal Coded Character Set) Transformation Format – 8-bit. UTF-8 is capable of encoding all 1,112,064 valid character code points in Unicode using one to four one-byte (8-bit) code units.


1 Answers

If you have a wide character string, you can encode it in UTF8 with the standard wcstombs() function. If you have it in some other encoding (e.g. Latin-1) you will have to decode it to a wide string first.

Edit: ... but wcstombs() depends on your locale settings, and it looks like you can't select a UTF8 locale on Windows. (You don't say what OS you're using.) WideCharToMultiByte() might be more useful on Windows, as you can specify the encoding in the call.

like image 153
Martin Stone Avatar answered Nov 05 '22 07:11

Martin Stone