Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MalformedByteSequenceException: Invalid byte 2 of 2-byte UTF-8 sequence

I have a xml file which contains arabic characters.When i try to parse a file,it arise the Exception,MalformedByteSequenceException: Invalid byte 2 of 2-byte UTF-8 sequence.I Use POI DOM for parse the document.

The Log is,

2012-03-19 11:30:00,433 [ERROR] (com.infomindz.remitglobe.bll.remittance.BlackListBean) - Error 

com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 2-byte UTF-8 sequence.

    at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(Unknown Source)

    at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(Unknown Source)

    at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(Unknown Source)

    at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(Unknown Source)

    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)

    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)

    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)

    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)

    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)

    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)

    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)

    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)

    at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)

    at com.infomindz.remitglobe.bll.remittance.BlackListBean.updateGeneralBlackListDetail(Unknown Source)

    at com.infomindz.remitglobe.bll.remittance.schedulers.BlackListUpdateScheduler.executeInternal(Unknown Source)

    at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86)

    at org.quartz.core.JobRunShell.run(JobRunShell.java:216)

    at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)

The exception arise only in windows Machine,not arise in Linux Machine.How can i resolve the issue.Any suggestion should be appreciable.

like image 390
Muneeswaran Balasubramanian Avatar asked Mar 29 '12 07:03

Muneeswaran Balasubramanian


4 Answers

I have resolve the problem,by create the XML file using UTF8 format.

OutputStreamWriter bufferedWriter = new OutputStreamWriter(filePath +
                        System.getProperty("file.separator") + fileName), "UTF8");

After create the file using the above code,the encoding problem is resolved.Thanks for every one,put the effort here.

like image 77
Muneeswaran Balasubramanian Avatar answered Oct 20 '22 13:10

Muneeswaran Balasubramanian


you can add a jvm parameter -Dfile.encoding=utf-8 to your jvm.

like image 27
Hsin Avatar answered Oct 20 '22 14:10

Hsin


All we can tell from the message is that the file is not properly encoded in UTF-8. To work out why, you will need to trace the history of how the file was created. It may (or may not) be helpful to study the file contents at the binary level to see what the actual encoding is. For example, it may be useful to know whether the whole file is in the wrong encoding, or whether it just contains a couple of stray characters in the wrong encoding.

like image 3
Michael Kay Avatar answered Oct 20 '22 13:10

Michael Kay


Quite simple solution:

File file = new File("c:\\file-utf.xml");
InputStream inputStream= new FileInputStream(file);
Reader reader = new InputStreamReader(inputStream,"UTF-8");

InputSource is = new InputSource(reader);
// is.setEncoding("UTF-8"); -> This line causes error! Content is not allowed in prolog

saxParser.parse(is, handler);

Ref: http://www.mkyong.com/java/sax-error-malformedbytesequenceexception-invalid-byte-1-of-1-byte-utf-8-sequence/

like image 3
Raaam Avatar answered Oct 20 '22 13:10

Raaam