Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java String Unicode Value

How can I get the unicode value of a string in java?

For example if the string is "Hi" I need something like \uXXXX\uXXXX

like image 625
user489041 Avatar asked Apr 20 '11 16:04

user489041


People also ask

How do you find the Unicode value of a string?

If you have Java 5, use char c = ...; String s = String. format ("\\u%04x", (int)c); If your source isn't a Unicode character ( char ) but a String, you must use charAt(index) to get the Unicode character at position index .

What is Unicode in Java string?

Unicode is an international standard of character encoding which has the capability of representing a majority of written languages all over the globe. Unicode uses hexadecimal to represent a character. Unicode is a 16-bit character encoding system. The lowest value is \u0000 and the highest value is \uFFFF.

What is Unicode value in Java?

Unicode is a computing industry standard designed to consistently and uniquely encode characters used in written languages throughout the world. The Unicode standard uses hexadecimal to express a character. For example, the value 0x0041 represents the Latin character A.

Can Java strings handle Unicode character strings?

Internally in Java all strings are kept in Unicode. Since not all text received from users or the outside world is in unicode, your application may have to convert from non-unicode to unicode.


2 Answers

Some unicode characters span two Java chars. Quote from http://docs.oracle.com/javase/tutorial/i18n/text/unicode.html :

The characters with values that are outside of the 16-bit range, and within the range from 0x10000 to 0x10FFFF, are called supplementary characters and are defined as a pair of char values.

correct way to escape non-ascii:

private static String escapeNonAscii(String str) {

  StringBuilder retStr = new StringBuilder();
  for(int i=0; i<str.length(); i++) {
    int cp = Character.codePointAt(str, i);
    int charCount = Character.charCount(cp);
    if (charCount > 1) {
      i += charCount - 1; // 2.
      if (i >= str.length()) {
        throw new IllegalArgumentException("truncated unexpectedly");
      }
    }

    if (cp < 128) {
      retStr.appendCodePoint(cp);
    } else {
      retStr.append(String.format("\\u%x", cp));
    }
  }
  return retStr.toString();
}
like image 167
Raghu A Avatar answered Sep 22 '22 06:09

Raghu A


This method converts an arbitrary String to an ASCII-safe representation to be used in Java source code (or properties files, for example):

public String escapeUnicode(String input) {
  StringBuilder b = new StringBuilder(input.length());
  Formatter f = new Formatter(b);
  for (char c : input.toCharArray()) {
    if (c < 128) {
      b.append(c);
    } else {
      f.format("\\u%04x", (int) c);
    }
  }
  return b.toString();
}
like image 28
Joachim Sauer Avatar answered Sep 23 '22 06:09

Joachim Sauer