Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Ignore 'Incorrect padding' error when base64 decoding

Tags:

python

base64

People also ask

What is a padding error?

If there's a padding error it probably means your string is corrupted; base64-encoded strings should have a multiple of four length. You can try adding the padding character ( = ) yourself to make the string a multiple of four, but it should already have that unless something is wrong.

Does base64 need padding?

Although assumptions about length aren't made, padding isn't needed because if there is something wrong it simply won't work. And this is exactly what the base64 RFC says, In some circumstances, the use of padding ("=") in base-encoded data is not required or used.

What character does base64 use for padding?

2.2. In Base64 encoding, the length of an output-encoded String must be a multiple of three. The encoder adds one or two padding characters (=) at the end of the output as needed in order to meet this requirement.

How do you decode a base64 encoded string in Python?

To decode an image using Python, we simply use the base64. b64decode(s) function. Python mentions the following regarding this function: Decode the Base64 encoded bytes-like object or ASCII string s and return the decoded bytes.


As said in other responses, there are various ways in which base64 data could be corrupted.

However, as Wikipedia says, removing the padding (the '=' characters at the end of base64 encoded data) is "lossless":

From a theoretical point of view, the padding character is not needed, since the number of missing bytes can be calculated from the number of Base64 digits.

So if this is really the only thing "wrong" with your base64 data, the padding can just be added back. I came up with this to be able to parse "data" URLs in WeasyPrint, some of which were base64 without padding:

import base64
import re

def decode_base64(data, altchars=b'+/'):
    """Decode base64, padding being optional.

    :param data: Base64 data as an ASCII byte string
    :returns: The decoded byte string.

    """
    data = re.sub(rb'[^a-zA-Z0-9%s]+' % altchars, b'', data)  # normalize
    missing_padding = len(data) % 4
    if missing_padding:
        data += b'='* (4 - missing_padding)
    return base64.b64decode(data, altchars)

Tests for this function: weasyprint/tests/test_css.py#L68


It seems you just need to add padding to your bytes before decoding. There are many other answers on this question, but I want to point out that (at least in Python 3.x) base64.b64decode will truncate any extra padding, provided there is enough in the first place.

So, something like: b'abc=' works just as well as b'abc==' (as does b'abc=====').

What this means is that you can just add the maximum number of padding characters that you would ever need—which is two (b'==')—and base64 will truncate any unnecessary ones.

This lets you write:

base64.b64decode(s + b'==')

which is simpler than:

base64.b64decode(s + b'=' * (-len(s) % 4))

Just add padding as required. Heed Michael's warning, however.

b64_string += "=" * ((4 - len(b64_string) % 4) % 4) #ugh

Use

string += '=' * (-len(string) % 4)  # restore stripped '='s

Credit goes to a comment somewhere here.

>>> import base64

>>> enc = base64.b64encode('1')

>>> enc
>>> 'MQ=='

>>> base64.b64decode(enc)
>>> '1'

>>> enc = enc.rstrip('=')

>>> enc
>>> 'MQ'

>>> base64.b64decode(enc)
...
TypeError: Incorrect padding

>>> base64.b64decode(enc + '=' * (-len(enc) % 4))
>>> '1'

>>> 

"Incorrect padding" can mean not only "missing padding" but also (believe it or not) "incorrect padding".

If suggested "adding padding" methods don't work, try removing some trailing bytes:

lens = len(strg)
lenx = lens - (lens % 4 if lens % 4 else 4)
try:
    result = base64.decodestring(strg[:lenx])
except etc

Update: Any fiddling around adding padding or removing possibly bad bytes from the end should be done AFTER removing any whitespace, otherwise length calculations will be upset.

It would be a good idea if you showed us a (short) sample of the data that you need to recover. Edit your question and copy/paste the result of print repr(sample).

Update 2: It is possible that the encoding has been done in an url-safe manner. If this is the case, you will be able to see minus and underscore characters in your data, and you should be able to decode it by using base64.b64decode(strg, '-_')

If you can't see minus and underscore characters in your data, but can see plus and slash characters, then you have some other problem, and may need the add-padding or remove-cruft tricks.

If you can see none of minus, underscore, plus and slash in your data, then you need to determine the two alternate characters; they'll be the ones that aren't in [A-Za-z0-9]. Then you'll need to experiment to see which order they need to be used in the 2nd arg of base64.b64decode()

Update 3: If your data is "company confidential":
(a) you should say so up front
(b) we can explore other avenues in understanding the problem, which is highly likely to be related to what characters are used instead of + and / in the encoding alphabet, or by other formatting or extraneous characters.

One such avenue would be to examine what non-"standard" characters are in your data, e.g.

from collections import defaultdict
d = defaultdict(int)
import string
s = set(string.ascii_letters + string.digits)
for c in your_data:
   if c not in s:
      d[c] += 1
print d