I am parsing XML using DocumentBuilder
.
XML has first line as this:
xml version="1.0" encoding="GBK"
I want to get encoding type of the XML and use it. How can I get "GBK"
Basically i will be making one more XML where i want encoding="GBK"
to be retained.
Currently it is getting lost and set to default UTF-8
There are many XML with different encoding and I need to read encoding of the source fileF.
XML documents must be encoded in a supported code page. XML documents generated in or parsed from national data items must be encoded in Unicode UTF-16 in big-endian format, CCSID 1200.
If no encoding declaration exists in a document's XML declaration, that XML document is required to use either UTF-8 or UTF-16 encoding.
Unicode Transformation Format, 8-bit encoding form is designed for ease of use with existing ASCII-based systems and enables use of all the characters in the Unicode standard.
UTF stands for UCS Transformation Format, and UCS itself means Universal Character Set. The number 8 or 16 refers to the number of bits used to represent a character. They are either 8(1 to 4 bytes) or 16(2 or 4 bytes). For the documents without encoding information, UTF-8 is set by default.
One way to this works like this
final XMLStreamReader xmlStreamReader = XMLInputFactory.newInstance().createXMLStreamReader( new FileReader( testFile ) );
//running on MS Windows fileEncoding is "CP1251"
String fileEncoding = xmlStreamReader.getEncoding();
//the XML declares UTF-8 so encodingFromXMLDeclaration is "UTF-8"
String encodingFromXMLDeclaration = xmlStreamReader.getCharacterEncodingScheme();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With