Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why my DOM parser cant read UTF-8

Tags:

java

dom

parsing

I have problem that my DOM parser can´t load file when there are UTF-8 characters in XML file Now, i am aware that i have to give him instruction to read utf-8, but i don´t know how to put it in my code here it is:

File xmlFile = new File(fileName);
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(xmlFile);
doc.getDocumentElement().normalize();

i am aware that there is method setencoding(), but i don´t know where to put it in my code...

like image 634
ivanz Avatar asked May 06 '13 13:05

ivanz


2 Answers

Try this. Worked for me

        InputStream inputStream= new FileInputStream(completeFileName);
        Reader reader = new InputStreamReader(inputStream,"UTF-8");
        InputSource is = new InputSource(reader);
        is.setEncoding("UTF-8");

        DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
        Document doc = dBuilder.parse(is);
like image 184
Rajesh Mbm Avatar answered Nov 19 '22 19:11

Rajesh Mbm


Try to use Reader and provide encoding as parameter:

InputStream inputStream = new FileInputStream(fileName);
documentBuilder.parse(new InputSource(new InputStreamReader(inputStream, "UTF-8")));
like image 7
Eugene Avatar answered Nov 19 '22 18:11

Eugene