Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why the size of base64-encoded string is larger than the original file

Tags:

base64

My original PDF file size is around 24MB, however when I encode it to based64 string, the string size is around 31MB. I'm wondering why that is.

It is easy to understand for an image file since since it may lose some compression, but it also happens to PDF or some other format files?

like image 648
leonsPAPA Avatar asked Nov 19 '15 18:11

leonsPAPA


People also ask

Why does Base64 increase size?

Each Base64 digit represents exactly 6 bits of data. So, three 8-bits bytes of the input string/binary file (3×8 bits = 24 bits) can be represented by four 6-bit Base64 digits (4×6 = 24 bits). This means that the Base64 version of a string or file will be at least 133% the size of its source (a ~33% increase).

Does Base64 encoding reduce size?

File size. Image Base64 encoding is not the most efficient way to encode data when it comes to filing size. This is because the process always results in a 20%-25% increase in file size at least. For example, if you have a binary file that is 1000 bytes in size, after Base64 encoding, it will be 1250 bytes in size.

How does Base64 determine file size?

Base64 encodes 3 bytes of binary data on 4 characters. So to get the size of the original data, you juste have to multiply the stringLength (minus the header) by 3/4.

Does Base64 take more space?

This encoding is designed to make binary data survive transport through transport layers that are not 8-bit clean, such as mail bodies. Base64-encoded data takes about 33% more space than the original data.

What is the size of a Base64 file in bytes?

Know file size with a base64 string. Very roughly, the final size of Base64-encoded binary data is equal to 1.37 times the original data size + 814 bytes (for headers). You can use ContentLength property of the request to determine what the size is in bytes, although if you are uploading more then one image, it might be trickier.

What is the difference between Base64 encoding and compression?

While compression actually compresses data, encoding just defines a way how data is encoded, which brings us to the first issue. Although Base64 is a relatively efficient way of encoding binary data it will, on average still increase the file size for more than 25%. This not only increases your bandwidth bill, but also increases the download time.

Are base64-images larger than regular images?

Although base64-images are larger, there a few conditions where base64 is the better choice. Size of base64-images Base64 uses 64 different characters and this is 2^6. So base64 stores 6bit per 8bit character. So the proportion is 6/8 from unconverted data to base64 data. This is no exact calculation, but a rough estimate. Example:

What is a Base64 string?

“ Base64 is a group of binary-to-text encoding schemes that represent binary data (more specifically, a sequence of 8-bit bytes) in an ASCII string format by translating the data into a radix -64 representation.” -Wikipedia So I had to calculate the size of the file represented by the base64 string and apply some business logic.


1 Answers

just wondering why

Because Base64 has fewer meaningful bits per byte than a binary data format (usually 6 instead of 8). This is specifically so it can survive various textual transformations that binary data would not.

Wikipedia's page has a good diagram showing this:

enter image description here

As a text table (sadly the GitHub-flavored markdown used by SO doesn't support tabls with varying numbers of columns):

+−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+
|   Text content  |               M               |               a               |               n               |
+−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+
|     ASCII       |           77 (0x4d)           |           97 (0x61)           |          110 (0x6e)           |
|  Bit pattern    | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 |
|     Index       |           19          |           22          |           5           |           46          |
| Base64−encoded  |           T           |           W           |           F           |           u           |
+−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−+

Note how the Base64 is only using the bottom six bits of each byte, and so "Man" ends up being four bytes long.

It is easy to understand for image file since since it may lose some compression

Just to be clear, Base64 encoding is lossless. When you decode it, you get byte-for-byte what you started with.

like image 55
T.J. Crowder Avatar answered Oct 15 '22 05:10

T.J. Crowder