Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

string decode utf-8

Tags:

How can I decode an utf-8 string with android? I tried with this commands but output is the same of input:

URLDecoder.decode("hello&//à", "UTF-8");  new String("hello&//à", "UTF-8");  EntityUtils.toString("hello&//à", "utf-8"); 
like image 860
magemello Avatar asked May 09 '11 22:05

magemello


People also ask

How do I decode a UTF-8 string in Python?

To decode a string encoded in UTF-8 format, we can use the decode() method specified on strings. This method accepts two arguments, encoding and error . encoding accepts the encoding of the string to be decoded, and error decides how to handle errors that arise during decoding.

How do I decode a string?

So first, write the very first character of the encoded string and remove it from the encoded string then start adding the first character of the encoded string first to the left and then to the right of the decoded string and do this task repeatedly till the encoded string becomes empty.

What is a UTF-8 string?

UTF-8 is an encoding system for Unicode. It can translate any Unicode character to a matching unique binary string, and can also translate the binary string back to a Unicode character. This is the meaning of “UTF”, or “Unicode Transformation Format.”

Is Java a UTF-8 string?

String objects in Java are encoded in UTF-16. Java Platform is required to support other character encodings or charsets such as US-ASCII, ISO-8859-1, and UTF-8. Errors may occur when converting between differently coded character data. There are two general types of encoding errors.


2 Answers

A string needs no encoding. It is simply a sequence of Unicode characters.

You need to encode when you want to turn a String into a sequence of bytes. The charset the you choose (UTF-8, cp1255, etc.) determines the Character->Byte mapping. Note that a character is not necessarily translated into a single byte. In most charsets, most Unicode characters are translated to at least two bytes.

Encoding of a String is carried out by:

String s1 = "some text"; byte[] bytes = s1.getBytes("UTF-8"); // Charset to encode into 

You need to decode when you have а sequence of bytes and you want to turn them into a String. When yоu dо that you need to specify, again, the charset with which the bytеs were originally encoded (otherwise you'll end up with garblеd tеxt).

Decoding:

String s2 = new String(bytes, "UTF-8"); // Charset with which bytes were encoded  

If you want to understand this better, a great text is "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)"

like image 136
Itay Maman Avatar answered Sep 23 '22 15:09

Itay Maman


the core functions are getBytes(String charset) and new String(byte[] data). you can use these functions to do UTF-8 decoding.

UTF-8 decoding actually is a string to string conversion, the intermediate buffer is a byte array. since the target is an UTF-8 string, so the only parameter for new String() is the byte array, which calling is equal to new String(bytes, "UTF-8")

Then the key is the parameter for input encoded string to get internal byte array, which you should know beforehand. If you don't, guess the most possible one, "ISO-8859-1" is a good guess for English user.

The decoding sentence should be

String decoded = new String(encoded.getBytes("ISO-8859-1")); 
like image 30
Zephyr Avatar answered Sep 19 '22 15:09

Zephyr