Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

File size in UTF-8 encoding?

Tags:

utf-8

I have created a file with UTF-8 encoding, but I don't understand the rules for the size it takes up on disk. Here is my complete research:

  1. First I created the file with a single Hindi letter 'क' and the file size on Windows 7 was
    8 bytes.

  2. Now with two letter 'कक' and the file size was 11 bytes.

  3. Now with three letter 'ककक'and the file size was 14 bytes.

Can someone please explain me why it is showing such sizes?

like image 959
Rana Avatar asked Apr 24 '14 08:04

Rana


1 Answers

The first three bytes are used for the BOM (Byte Order Mark) EF BB BF.

Then, the bytes E0 A4 95 encode the letter क.

Then the bytes 0D 0A encode a carriage return.

Total: 8 bytes. For each letter क you add, you need three more bytes.

like image 137
Tim Pietzcker Avatar answered Oct 18 '22 00:10

Tim Pietzcker