Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do text editors store data above 1 byte?

The basic question is, how does notepad (or other basic text editors) store data. I ran into this because I was trying to compare file size of different compression techniques, and realized something isn't quite right.

To elaborate..

If I save a text file with the following contents:

a

The file is 1 byte. This one happens to be 97, or 0x61.

I create a text file with the following contents:

 !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ

Which is all the characters from 0-255, or 0x00 to 0xFF. This file is 256 bytes. 1 byte for each character. This makes sense to me.

Then I append the following character to the end of the above string.

A character not contained in the above string. All 8 bit characters were already used. This character is 8224, or 0x2020. A 2 bytes character.

And yet, the file size has only changed from 256 to 257 bytes. In fact, the above character saved by itself only shows 1 byte.

What am I missing?

Edit: Please note that in the second text block, many of the characters do not show up on here.

like image 726
qoou Avatar asked Mar 12 '23 07:03

qoou


1 Answers

In ANSI encoding (This 8-bit Microsoft-specific encoding), you save each character in one byte (8-bit).

ANSI also called Windows-1252, or Windows Latin-1

You should have a look at ANSI table in ANSI Character Codes Chart or Windows-1252

So for character, its code is 134, byte 0x86.

like image 84
Siyavash Hamdi Avatar answered Mar 15 '23 03:03

Siyavash Hamdi