The difference between InputStream and InputStreamReader when reading multi-byte characters

People also ask

What is the difference between InputStream and InputStreamReader?

An InputStream is typically always connected to some data source, like a file, network connection, pipe etc. This is also explained in more detail in the Java IO Overview text. InputStreamReader takes an inputstream and converts the bytes Strem into characters when you are reading it.

What is the difference between InputStream and BufferedInputStream?

DataInputStream is a kind of InputStream to read data directly as primitive data types. BufferedInputStream is a kind of inputStream that reads data from a stream and uses a buffer to optimize speed access to data.

What is the difference between InputStream and Reader in java?

Reader is Character Based, it can be used to read or write characters. FileInputStream is Byte Based, it can be used to read bytes. FileReader is Character Based, it can be used to read characters. FileInputStream is used for reading binary files.

What is InputStreamReader used for?

An InputStreamReader is a bridge from byte streams to character streams: It reads bytes and decodes them into characters using a specified charset . The charset that it uses may be specified by name or may be given explicitly, or the platform's default charset may be accepted.

An InputStream reads raw octet (8 bit) data. In Java, the byte type is equivalent to the char type in C. In C, this type can be used to represent character data or binary data. In Java, the char type shares greater similarities with the C wchar_t type.

An InputStreamReader then will transform data from some encoding into UTF-16. If "a你们" is encoded as UTF-8 on disk, it will be the byte sequence 61 E4 BD A0 E4 BB AC. When you pass the InputStream to InputStreamReader with the UTF-8 encoding, it will be read as the char sequence 0061 4F60 4EEC.

The character encoding API in Java contains the algorithms to perform this transformation. You can find a list of encodings supported by the Oracle JRE here. The ICU project is a good place to start if you want to understand the internals of how this works in practice.

As Alexander Pogrebnyak points out, you should almost always provide the encoding explicitly. byte-to-char methods that do not specify an encoding rely on the JRE default, which is dependent on operating systems and user settings.

You have to give reader a hint, by providing a character set that your binary file is written in. E.g

Reader reader =
   new InputStreamReader(
       new FileInputStream( "/path/to/file" ),
       "UTF-8" // most likely that the encoding of the file
   )

Without a hint it will use your platform default encoding, which in many cases is not what you want.

This link has a nice explanation of encodings: http://www.joelonsoftware.com/articles/Unicode.html

Related questions
                            
                                Avoiding getfield opcode
                            
                                what's the difference between ParallelGC and ParallelOldGC?
                            
                                Learning Java, use of synchronized keyword
                            
                                Cake pattern with Java8 possible?
                            
                                Why we use Class.forName(“oracle.jdbc.driver.OracleDriver”) while connecting to a database?
                            
                                Should Kotlin files be put in a separate source directory in Android?
                            
                                How to use Postgres JSONB datatype with JPA?
                            
                                Java how to delete a file that has the IMMUTABLE bit set
                            
                                Spring Boot Authentication for Integration Tests
                            
                                Inconsistent "possible lossy conversion from int to byte" compile-time error
                            
                                Is there a library to convert Java POJOs to and from JSON and XML? [closed]
                            
                                Simplest way to correctly load html from web page into a string in Java
                            
                                java library for reading RSS and ATOM feeds [duplicate]
                            
                                How to make a redirection on page load in JSF 1.x
                            
                                Do anonymous classes *always* maintain a reference to their enclosing instance?
                            
                                "Could not find the main class" when double-clicking .jar file
                            
                                Yet again on string append vs concat vs +
                            
                                Selenium: How to make the web driver to wait for page to refresh before executing another test
                            
                                jni.h: no such file or directory
                            
                                Java code generation [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

The difference between InputStream and InputStreamReader when reading multi-byte characters

Tags:

java

io

character-encoding

People also ask

Recent Activity

Donate For Us