Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting encoding type of a XML file?

Tags:

java

xml

I am parsing XML using DocumentBuilder.

XML has first line as this:

xml version="1.0" encoding="GBK"

I want to get encoding type of the XML and use it. How can I get "GBK"

Basically i will be making one more XML where i want encoding="GBK" to be retained.

Currently it is getting lost and set to default UTF-8

There are many XML with different encoding and I need to read encoding of the source fileF.

like image 419
user1228785 Avatar asked Feb 24 '12 10:02

user1228785


People also ask

What is the encoding of XML file?

XML documents must be encoded in a supported code page. XML documents generated in or parsed from national data items must be encoded in Unicode UTF-16 in big-endian format, CCSID 1200.

Does XML have to be UTF-8?

If no encoding declaration exists in a document's XML declaration, that XML document is required to use either UTF-8 or UTF-16 encoding.

What is UTF encoding in XML?

Unicode Transformation Format, 8-bit encoding form is designed for ease of use with existing ASCII-based systems and enables use of all the characters in the Unicode standard.

What is UTF-16 in XML?

UTF stands for UCS Transformation Format, and UCS itself means Universal Character Set. The number 8 or 16 refers to the number of bits used to represent a character. They are either 8(1 to 4 bytes) or 16(2 or 4 bytes). For the documents without encoding information, UTF-8 is set by default.


1 Answers

One way to this works like this

final XMLStreamReader xmlStreamReader = XMLInputFactory.newInstance().createXMLStreamReader( new FileReader( testFile ) );

//running on MS Windows fileEncoding is "CP1251"
String fileEncoding = xmlStreamReader.getEncoding(); 

//the XML declares UTF-8 so encodingFromXMLDeclaration is "UTF-8"
String encodingFromXMLDeclaration = xmlStreamReader.getCharacterEncodingScheme(); 
like image 67
Matthias Heinrich Avatar answered Oct 14 '22 01:10

Matthias Heinrich