I'm searching for a library (Apache / BSD / EPL licensed) to convert native text to ASCII using \u for characters not available in ASCII (basically what java.util.Properties does).
I had a look and there don't seem to be any readily available libraries. I found:
Is anyone aware of a library under the above stated licenses?
You can do this with an CharsetEncoder. You have to read the 'native' Text with the correct encoding to unicode. Than you can use an 'US-ASCII'-encoder to detect, which characters are to be translated into unicode escapes.
import java.nio.charset.Charset;
import java.nio.charset.CharsetEncoder;
import org.junit.Test;
public class EncodeToEscapes {
@Test
public void testEncoding() {
final String src = "Hallo äöü"; // this has to be read with the right encoding
final CharsetEncoder asciiEncoder = Charset.forName("US-ASCII").newEncoder();
final StringBuilder result = new StringBuilder();
for (final Character character : src.toCharArray()) {
if (asciiEncoder.canEncode(character)) {
result.append(character);
} else {
result.append("\\u");
result.append(Integer.toHexString(0x10000 | character).substring(1).toUpperCase());
}
}
System.out.println(result);
}
}
Additionally org.apache.commons:commons-lang contains StringEscapeUtils.escapeJava() which can escape and unescape native strings.
Try this piece of code from Apache commons-lang:
StringEscapeUtils.escapeJava("ایران زیبای من");
StringEscapeUtils.unescapeJava("\u0627\u06CC\u0631\u0627\u0646 \u0632\u06CC\u0628\u0627\u06CC \u0645\u0646");
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With