Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert File with known encoding to UTF-8

I need to convert text file to the String, which, finally, I should put as an input parameter (type InputStream) to IFile.create (Eclipse). Looking for the example or how to do that but still can not figure out...need your help!

just for testing, I did try to convert original text file to UTF-8 encoded with this code

FileInputStream fis = new FileInputStream(FilePath);
InputStreamReader isr = new InputStreamReader(fis);

Reader in = new BufferedReader(isr);
StringBuffer buffer = new StringBuffer();

int ch;
while ((ch = in.read()) > -1) {
    buffer.append((char)ch);
}
in.close();


FileOutputStream fos = new FileOutputStream(FilePath+".test.txt");
Writer out = new OutputStreamWriter(fos, "UTF8");
out.write(buffer.toString());
out.close();

but even thought the final *.test.txt file has UTF-8 encoding, the characters inside are corrupted.

like image 827
JackBauer Avatar asked Jan 21 '23 22:01

JackBauer


1 Answers

You need to specify the encoding of the InputStreamReader using the Charset parameter.

                                    // ↓ whatever the input's encoding is
Charset inputCharset = Charset.forName("ISO-8859-1");
InputStreamReader isr = new InputStreamReader(fis, inputCharset));

This also works:

InputStreamReader isr = new InputStreamReader(fis, "ISO-8859-1"));

See also:

  • InputStreamReader(InputStream in, Charset cs)
  • Charset.forName(String charsetName)
  • Java: How to determine the correct charset encoding of a stream
  • How to reliably guess the encoding between MacRoman, CP1252, Latin1, UTF-8, and ASCII
  • GuessEncoding - only works for UTF-8, UTF-16LE, UTF-16BE, and UTF-32 ☹
  • ICU Charset Detector
  • cpdetector, free java codepage detection
  • JCharDet (Java port of Mozilla charset detector) ironically, that page does not render the apostrophe in "Mozilla's" correctly

SO search where I found all these links: https://stackoverflow.com/search?q=java+detect+encoding


You can get the default charset - which is comes from the system the JVM is running on - at runtime via Charset.defaultCharset().

like image 185
Matt Ball Avatar answered Jan 30 '23 20:01

Matt Ball