Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java: How to create unicode from string "\u00C3" etc

I have a file that has strings hand typed as \u00C3. I want to create a unicode character that is being represented by that unicode in java. I tried but could not find how. Help.

Edit: When I read the text file String will contain "\u00C3" not as unicode but as ASCII chars '\' 'u' '0' '0' '3'. I would like to form unicode character from that ASCII string.

like image 828
Ravi Avatar asked Feb 14 '11 21:02

Ravi


People also ask

How do you write a unicode character in Java?

Unicode character literals To print Unicode characters, enter the escape sequence “u”. Unicode sequences can be used everywhere in Java code. As long as it contains Unicode characters, it can be used as an identifier.

How do I return unicode in Java?

If you have Java 5, use char c = ...; String s = String. format ("\\u%04x", (int)c); If your source isn't a Unicode character ( char ) but a String, you must use charAt(index) to get the Unicode character at position index .

Is Java ASCII or unicode?

Java actually uses Unicode, which includes ASCII and other characters from languages around the world.


2 Answers

I picked this up somewhere on the web:

String unescape(String s) {
    int i=0, len=s.length();
    char c;
    StringBuffer sb = new StringBuffer(len);
    while (i < len) {
        c = s.charAt(i++);
        if (c == '\\') {
            if (i < len) {
                c = s.charAt(i++);
                if (c == 'u') {
                    // TODO: check that 4 more chars exist and are all hex digits
                    c = (char) Integer.parseInt(s.substring(i, i+4), 16);
                    i += 4;
                } // add other cases here as desired...
            }
        } // fall through: \ escapes itself, quotes any character but u
        sb.append(c);
    }
    return sb.toString();
}
like image 91
Ted Hopp Avatar answered Nov 01 '22 20:11

Ted Hopp


Dang, I was a bit slow. Here's my solution:

package ravi;

import java.io.BufferedReader;
import java.io.FileReader;
import java.util.regex.Pattern;
public class Ravi {

    private static final Pattern UCODE_PATTERN = Pattern.compile("\\\\u[0-9a-fA-F]{4}");

    public static void main(String[] args) throws Exception {
        BufferedReader br = new BufferedReader(new FileReader("ravi.txt"));
        while (true) {
            String line = br.readLine();
            if (line == null) break;
            if (!UCODE_PATTERN.matcher(line).matches()) {
                System.err.println("Bad input: " + line);
            } else {
                String hex = line.substring(2,6);
                int number = Integer.parseInt(hex, 16);
                System.out.println(hex + " -> " + ((char) number));
            }
        }
    }

}
like image 29
Carl Smotricz Avatar answered Nov 01 '22 20:11

Carl Smotricz