In a Java program, I spawn a new Process
via ProcessBuilder
.
args[0] = directory.getAbsolutePath() + File.separator + program;
ProcessBuilder pb = new ProcessBuilder(args);
pb.directory(directory);
final Process process = pb.start();
Then, I read the process standard output with a new Thread
new Thread() {
public void run() {
BufferedReader reader = new BufferedReader(
new InputStreamReader(process.getInputStream()));
String line = "";
while ((line = reader.readLine()) != null) {
System.out.println(line);
}
}.start();
However, when the process outputs non-ASCII characters (such as 'é'
), the line
has character '\uFFFD'
instead.
What is the encoding in the InputStream
returned by getInputStream
(my platform is Windows in Europe)?
How can I change things so that line
contains the expected data (i.e. '\u00E9'
for 'é'
)?
Edit: I tried new InputStreamReader(...,"UTF-8")
:
é
becomes \uFFFD
An InputStream is a binary stream, so there is no encoding. When you create the Reader, you need to know what character encoding to use, and that would depend on what the program you called produces (Java will not convert it in any way).
If you do not specify anything for InputStreamReader, it will use the platform default encoding, which may not be appropriate. There is another constructor that allows you to specify the encoding.
If you know what encoding to use (and you really have to know):
new InputStreamReader(process.getInputStream(), "UTF-8") // for example
Interestingly enough, when running on Windows:
ProcessBuilder pb = new ProcessBuilder("cmd", "/c dir");
Process process = pb.start();
Then CP437 code page works quite well for
new InputStreamReader(process.getInputStream(), "CP437");
As I understand, an operation system streams are byte-streams, there are no characters here. The InputStreamReader
constructor uses jvm default character set java.nio.charset.Charset#defaultCharset()
, you could use another constructor to explicitly specify a character set.
According to http://www.fileformat.info/info/unicode/char/e9/index.htm '\uFFFD' is a unicode code for character 'é'. It actually means that you are reading the stream correctly. Your problem is in writing.
Windows console does not support unicode by default. So, if you want to test your code open file and write your stream there. But do not forget to set the encoding UTF-8
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With