Why does a byte in Java I/O can represent a character?

Question

And I see the characters are only ASCII. Then it's not dynamic, right?

Is there any explanation for this?

What is the difference between byte streams and character streams?

cHao · Accepted Answer

Bytes are not characters. Alone, they can't even represent characters.

Computingwise, a "character" is a pairing of a numeric code (or sequence of codes) with an encoding or character set that defines how the codes map to real-world characters (or to whitespace, or to control codes).

Only once paired with an encoding can bytes represent characters. With some encodings (like ASCII or ISO-8859-1), one byte can represent one character...and many encodings are even ASCII-compatible (meaning that the character codes from 0 to 127 align with ASCII's definition for them)...but without the original mapping, you don't know what you have.

Without an encoding, bytes are just 8-bit integers.

You can interpret them any way you like by forcing an encoding onto them. That is exactly what you're doing when you convert a byte to char, say new String(myBytes), etc, or even edit a file containing the bytes in a text editor. (In that case, it's the editor applying the encoding.) In doing so, you might even get something that makes sense. But without knowing the original encoding, you can't know for sure what those bytes were intended to represent.

It might not even be text.

For example, consider the byte sequence 0x48 0x65 0x6c 0x6c 0x6f 0x2e. It can be interpreted as:

Hello. in ASCII and compatible 8-bit encodings;
dinner in some 8-bit encoding i made up just to prove this point;
䡥汬漮 in big-endian UTF-16^*;
a steel-blue pixel followed by a greyish-yellowish one, in RGB;
load r101, [0x6c6c6f2e] in some unknown processor's assembly language;

or any of a million other things. Those six bytes alone can't tell you which interpretation is correct.

With text, at least, that's what encodings are for.

But if you want the interpretation to be right, you need to use the same encoding to decode those bytes as was used to generate them. That's why it's so important to know how your text was encoded.

The difference between a byte stream and a character stream is that the character stream attempts to work with characters rather than bytes. (It actually works with UTF-16 code units. But since we know the encoding, that's good enough for most purposes.) If it's wrapped around a byte stream, the character stream uses an encoding to convert the bytes read from the underlying byte stream to chars (or chars written to the stream to bytes).

^{* Note: I don't know whether "䡥汬漮" is profanity or even makes any sense...but neither does a computer unless you program it to read Chinese.}

Why does a byte in Java I/O can represent a character?

Tags:

java

io

byte

Keenan Gebze

1 Answers

cHao

Recent Activity

Donate For Us

Why does a byte in Java I/O can represent a character?

Tags:

java

io

byte

Keenan Gebze

1 Answers

cHao

Related questions

Recent Activity

Donate For Us