Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

why do char takes 2 bytes as it can be stored in one byte

Tags:

c#

can anybody tell me that in c# why does char takes two bytes although it can be stored in one byte. Don't you think it is wastage of a memory. if not , then how is extra 1-byte used? in simple words ..please make me clear what is the use of extra 8-bits.!!

like image 914
animesh Avatar asked Jul 21 '11 19:07

animesh


People also ask

Why do chars take 2 bytes?

And, every char is made up of 2 bytes because Java internally uses UTF-16. For instance, if a String contains a word in the English language, the leading 8 bits will all be 0 for every char, as an ASCII character can be represented using a single byte.

Why is a char one byte?

the (binary) representation of a char (in standard character set) can fit into 1 byte. At the time of the primary development of C , the most commonly available standards were ASCII and EBCDIC which needed 7 and 8 bit encoding, respectively. So, 1 byte was sufficient to represent the whole character set.

Why char uses 2 byte in Java and what is \u0000?

Why char uses 2 byte in java and what is \u0000 ? It is because java uses Unicode system not ASCII code system. The \u0000 is the lowest range of Unicode system. Unicode is a universal international standard character encoding that is capable of representing most of the world's written languages.


1 Answers

although it can be stored in one byte

What makes you think that?

It only takes one byte to represent every character in the English language, but other languages use other characters. Consider the number of different alphabets (Latin, Chinese, Arabic, Cyrillic...), and the number of symbols in each of these alphabets (not only letters or digits, but also punctuation marks and other special symbols)... there are tens of thousands of different symbols in use in the world ! So one byte is never going to be enough to represent them all, that's why the Unicode standard was created.

Unicode has several representations (UTF-8, UTF-16, UTF-32...). .NET strings use UTF-16, which takes two bytes per character (code points, actually). Of course, two bytes is still not enough to represent all the different symbols in the world; surrogate pairs are used to represent characters above U+FFFF

like image 86
Thomas Levesque Avatar answered Oct 04 '22 10:10

Thomas Levesque