My original PDF file size is around 24MB, however when I encode it to based64 string, the string size is around 31MB. I'm wondering why that is.
It is easy to understand for an image file since since it may lose some compression, but it also happens to PDF or some other format files?
Each Base64 digit represents exactly 6 bits of data. So, three 8-bits bytes of the input string/binary file (3×8 bits = 24 bits) can be represented by four 6-bit Base64 digits (4×6 = 24 bits). This means that the Base64 version of a string or file will be at least 133% the size of its source (a ~33% increase).
File size. Image Base64 encoding is not the most efficient way to encode data when it comes to filing size. This is because the process always results in a 20%-25% increase in file size at least. For example, if you have a binary file that is 1000 bytes in size, after Base64 encoding, it will be 1250 bytes in size.
Base64 encodes 3 bytes of binary data on 4 characters. So to get the size of the original data, you juste have to multiply the stringLength (minus the header) by 3/4.
This encoding is designed to make binary data survive transport through transport layers that are not 8-bit clean, such as mail bodies. Base64-encoded data takes about 33% more space than the original data.
Know file size with a base64 string. Very roughly, the final size of Base64-encoded binary data is equal to 1.37 times the original data size + 814 bytes (for headers). You can use ContentLength property of the request to determine what the size is in bytes, although if you are uploading more then one image, it might be trickier.
While compression actually compresses data, encoding just defines a way how data is encoded, which brings us to the first issue. Although Base64 is a relatively efficient way of encoding binary data it will, on average still increase the file size for more than 25%. This not only increases your bandwidth bill, but also increases the download time.
Although base64-images are larger, there a few conditions where base64 is the better choice. Size of base64-images Base64 uses 64 different characters and this is 2^6. So base64 stores 6bit per 8bit character. So the proportion is 6/8 from unconverted data to base64 data. This is no exact calculation, but a rough estimate. Example:
“ Base64 is a group of binary-to-text encoding schemes that represent binary data (more specifically, a sequence of 8-bit bytes) in an ASCII string format by translating the data into a radix -64 representation.” -Wikipedia So I had to calculate the size of the file represented by the base64 string and apply some business logic.
just wondering why
Because Base64 has fewer meaningful bits per byte than a binary data format (usually 6 instead of 8). This is specifically so it can survive various textual transformations that binary data would not.
Wikipedia's page has a good diagram showing this:
As a text table (sadly the GitHub-flavored markdown used by SO doesn't support tabls with varying numbers of columns):
+−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+
| Text content | M | a | n |
+−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+
| ASCII | 77 (0x4d) | 97 (0x61) | 110 (0x6e) |
| Bit pattern | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 |
| Index | 19 | 22 | 5 | 46 |
| Base64−encoded | T | W | F | u |
+−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−+
Note how the Base64 is only using the bottom six bits of each byte, and so "Man" ends up being four bytes long.
It is easy to understand for image file since since it may lose some compression
Just to be clear, Base64 encoding is lossless. When you decode it, you get byte-for-byte what you started with.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With