I have a bunch of characters with that looks something like this:
Комуникационна кабелна система
and sometimes I have a mix like this:
Généralités
The first translates into :
Комуникационна кабелна система
and the second to:
Généralités
I can see this using a browser and place them into the body.
But how can I make java output the "real" characters ? What is the above encoding called?
I have tried a couple of things, and lastly this ( which did not work ):
import java.nio.charset.*;
import java.nio.ByteBuffer;
import java.nio.CharBuffer;
List<String> lst = new ArrayList<String>(); lst.add("К"); lst.add("о");
for ( String s : lst ) {
Charset utf8charset = Charset.forName("UTF-8");
Charset iso88591charset = Charset.forName("ISO-8859-1");
ByteBuffer inputBuffer = ByteBuffer.wrap( s.getBytes() );
// decode UTF-8
CharBuffer data = utf8charset.decode(inputBuffer);
// encode ISO-8559-1
ByteBuffer outputBuffer = iso88591charset.encode(data);
byte[] outputData = outputBuffer.array();
System.out.println ( new String(outputData) )
}
You can use commons-lang to unescape this sort of thing. In Groovy:
@Grab( 'commons-lang:commons-lang:2.6' )
import org.apache.commons.lang.StringEscapeUtils as SEU
def str = 'Généralités'
println SEU.unescapeHtml( str )
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With