I have a xml file which contains arabic characters.When i try to parse a file,it arise the Exception,MalformedByteSequenceException: Invalid byte 2 of 2-byte UTF-8 sequence.I Use POI DOM for parse the document.
The Log is,
2012-03-19 11:30:00,433 [ERROR] (com.infomindz.remitglobe.bll.remittance.BlackListBean) - Error
com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 2-byte UTF-8 sequence.
at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
at com.infomindz.remitglobe.bll.remittance.BlackListBean.updateGeneralBlackListDetail(Unknown Source)
at com.infomindz.remitglobe.bll.remittance.schedulers.BlackListUpdateScheduler.executeInternal(Unknown Source)
at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86)
at org.quartz.core.JobRunShell.run(JobRunShell.java:216)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)
The exception arise only in windows Machine,not arise in Linux Machine.How can i resolve the issue.Any suggestion should be appreciable.
I have resolve the problem,by create the XML file using UTF8 format.
OutputStreamWriter bufferedWriter = new OutputStreamWriter(filePath +
System.getProperty("file.separator") + fileName), "UTF8");
After create the file using the above code,the encoding problem is resolved.Thanks for every one,put the effort here.
you can add a jvm parameter -Dfile.encoding=utf-8 to your jvm.
All we can tell from the message is that the file is not properly encoded in UTF-8. To work out why, you will need to trace the history of how the file was created. It may (or may not) be helpful to study the file contents at the binary level to see what the actual encoding is. For example, it may be useful to know whether the whole file is in the wrong encoding, or whether it just contains a couple of stray characters in the wrong encoding.
Quite simple solution:
File file = new File("c:\\file-utf.xml");
InputStream inputStream= new FileInputStream(file);
Reader reader = new InputStreamReader(inputStream,"UTF-8");
InputSource is = new InputSource(reader);
// is.setEncoding("UTF-8"); -> This line causes error! Content is not allowed in prolog
saxParser.parse(is, handler);
Ref: http://www.mkyong.com/java/sax-error-malformedbytesequenceexception-invalid-byte-1-of-1-byte-utf-8-sequence/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With