Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java:How can i get the encoding from inputStream?

Tags:

java

io

encoding

I want get the encoding from a stream.

1st method - to use the InputStreamReader.

But it always return OS encode.

InputStreamReader reader = new InputStreamReader(new FileInputStream("aa.rar"));
System.out.println(reader.getEncoding());

output:GBK

2nd method - to use the UniversalDetector.

But it always return null.

    FileInputStream input = new FileInputStream("aa.rar");

    UniversalDetector detector = new UniversalDetector(null);
    byte[] buf = new byte[4096];

    int nread;
    while ((nread = input.read(buf)) > 0 && !detector.isDone()) {
        detector.handleData(buf, 0, nread);
    }

    // (3)
    detector.dataEnd();

    // (4)
    String encoding = detector.getDetectedCharset();

    if (encoding != null) {
        System.out.println("Detected encoding = " + encoding);
    } else {
        System.out.println("No encoding detected.");
    }

    // (5)
    detector.reset();

output:null

How can I get the right? :(

like image 855
youzhi.zhang Avatar asked Nov 29 '11 03:11

youzhi.zhang


1 Answers

Let's resume the situation:

  • InputStream delivers bytes
  • *Readers deliver chars in some encoding
  • new InputStreamReader(inputStream) uses the operating system encoding
  • new InputStreamReader(inputStream, "UTF-8") uses the given encoding (here UTF-8)

So one needs to know the encoding before reading. You did everything right using first a charset detecting class.

Reading http://code.google.com/p/juniversalchardet/ it should handle UTF-8 and UTF-16. You might use the editor JEdit to verify the encoding, and see whether there is some problem.

like image 114
Joop Eggen Avatar answered Sep 20 '22 22:09

Joop Eggen