The basic question is, how does notepad (or other basic text editors) store data. I ran into this because I was trying to compare file size of different compression techniques, and realized something isn't quite right.
To elaborate..
If I save a text file with the following contents:
a
The file is 1 byte. This one happens to be 97, or 0x61.
I create a text file with the following contents:
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
Which is all the characters from 0-255, or 0x00 to 0xFF. This file is 256 bytes. 1 byte for each character. This makes sense to me.
Then I append the following character to the end of the above string.
†
A character not contained in the above string. All 8 bit characters were already used. This character is 8224, or 0x2020. A 2 bytes character.
And yet, the file size has only changed from 256 to 257 bytes. In fact, the above character saved by itself only shows 1 byte.
What am I missing?
Edit: Please note that in the second text block, many of the characters do not show up on here.
In ANSI
encoding (This 8-bit Microsoft-specific encoding), you save each character in one byte (8-bit).
ANSI
also called Windows-1252
, or Windows Latin-1
You should have a look at ANSI
table in ANSI Character Codes Chart or Windows-1252
So for †
character, its code is 134
, byte 0x86
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With