I understand that byte streams deal with bytes and character streams deal with characters... if I use a byte stream to read in characters, could this limit me to the sorts of characters I might read? For instance, bytes are read in as 8 bit bytes, characters are read in as 16 bit characters... does this mean that more characters can be represented using character streams rather than byte streams?
The last thing im confused about is how a byte stream writes out to a file for reading. If I was recieving bytes from a network socket, I would wrap them in a InputStreamReader for writing, this way I would get the character transformation logic the character stream provides. If I read from a file using a FileInputStream and write out using a FileOutputStream, why is this file readable when I open it with a text editor? How is the FileOutputStream treating the bytes?
The key concept here is character encoding: each human readable character is somehow encoded into one or more bytes. There are plenty of character encodings. The most popular ones are:
These encodings are readable even when you open a file in hex editor. However there many character encodings that do not have this feature, namely UTF-16 and UTF-32.
Now back to your question: InputStream only gives you a stream of bytes. If your bytes represent characters encoded with ASCII or UTF-8, most of the time you are fine. But if these bytes represent something more sophisticated like UTF-16, you absolutely need a Reader. Of course the reader has to know which character encoding does the underlying InputStream provide. This is often a problem done by the beginners - Reader not initialized with character encoding explicitly will often fall back to system default.
Other way (with writers) is similar. If you simply cast your chars to bytes, most of the time you will be fine. But if your characters contain less popular national letters, your output will be malformed/truncated. So you create a Writer that converts each given charater to a series of one or more bytes. Once again you are obligated to provide the character encoding.
Important rules:
InputStream when dealing with binary data (multimedia, ZIP and PDF files, etc.)Reader when reading text (txt, HTML, XML...)A char is a 16 bit string that represents a Unicode character.
A byte is an 8 bit string that represents a 2's complement number.
The important thing here is that they are both bit strings. Technically speaking, a char is simply 2 bytes. Nothing more, nothing less aside from some minor semantics with how Java treats the two. As far as the computer (or Input/OutputStreams) are concerned, the only difference is the number of bits they hold.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With