Python: Ignore 'Incorrect padding' error when base64 decoding

People also ask

What is a padding error?

If there's a padding error it probably means your string is corrupted; base64-encoded strings should have a multiple of four length. You can try adding the padding character ( = ) yourself to make the string a multiple of four, but it should already have that unless something is wrong.

Does base64 need padding?

Although assumptions about length aren't made, padding isn't needed because if there is something wrong it simply won't work. And this is exactly what the base64 RFC says, In some circumstances, the use of padding ("=") in base-encoded data is not required or used.

What character does base64 use for padding?

2.2. In Base64 encoding, the length of an output-encoded String must be a multiple of three. The encoder adds one or two padding characters (=) at the end of the output as needed in order to meet this requirement.

How do you decode a base64 encoded string in Python?

To decode an image using Python, we simply use the base64. b64decode(s) function. Python mentions the following regarding this function: Decode the Base64 encoded bytes-like object or ASCII string s and return the decoded bytes.

As said in other responses, there are various ways in which base64 data could be corrupted.

However, as Wikipedia says, removing the padding (the '=' characters at the end of base64 encoded data) is "lossless":

From a theoretical point of view, the padding character is not needed, since the number of missing bytes can be calculated from the number of Base64 digits.

So if this is really the only thing "wrong" with your base64 data, the padding can just be added back. I came up with this to be able to parse "data" URLs in WeasyPrint, some of which were base64 without padding:

import base64
import re

def decode_base64(data, altchars=b'+/'):
    """Decode base64, padding being optional.

    :param data: Base64 data as an ASCII byte string
    :returns: The decoded byte string.

    """
    data = re.sub(rb'[^a-zA-Z0-9%s]+' % altchars, b'', data)  # normalize
    missing_padding = len(data) % 4
    if missing_padding:
        data += b'='* (4 - missing_padding)
    return base64.b64decode(data, altchars)

Tests for this function: weasyprint/tests/test_css.py#L68

It seems you just need to add padding to your bytes before decoding. There are many other answers on this question, but I want to point out that (at least in Python 3.x) base64.b64decode will truncate any extra padding, provided there is enough in the first place.

So, something like: b'abc=' works just as well as b'abc==' (as does b'abc=====').

What this means is that you can just add the maximum number of padding characters that you would ever need—which is two (b'==')—and base64 will truncate any unnecessary ones.

This lets you write:

base64.b64decode(s + b'==')

which is simpler than:

base64.b64decode(s + b'=' * (-len(s) % 4))

Just add padding as required. Heed Michael's warning, however.

b64_string += "=" * ((4 - len(b64_string) % 4) % 4) #ugh

Use

string += '=' * (-len(string) % 4)  # restore stripped '='s

Credit goes to a comment somewhere here.

>>> import base64

>>> enc = base64.b64encode('1')

>>> enc
>>> 'MQ=='

>>> base64.b64decode(enc)
>>> '1'

>>> enc = enc.rstrip('=')

>>> enc
>>> 'MQ'

>>> base64.b64decode(enc)
...
TypeError: Incorrect padding

>>> base64.b64decode(enc + '=' * (-len(enc) % 4))
>>> '1'

>>>

"Incorrect padding" can mean not only "missing padding" but also (believe it or not) "incorrect padding".

If suggested "adding padding" methods don't work, try removing some trailing bytes:

lens = len(strg)
lenx = lens - (lens % 4 if lens % 4 else 4)
try:
    result = base64.decodestring(strg[:lenx])
except etc

Update: Any fiddling around adding padding or removing possibly bad bytes from the end should be done AFTER removing any whitespace, otherwise length calculations will be upset.

It would be a good idea if you showed us a (short) sample of the data that you need to recover. Edit your question and copy/paste the result of print repr(sample).

Update 2: It is possible that the encoding has been done in an url-safe manner. If this is the case, you will be able to see minus and underscore characters in your data, and you should be able to decode it by using base64.b64decode(strg, '-_')

If you can't see minus and underscore characters in your data, but can see plus and slash characters, then you have some other problem, and may need the add-padding or remove-cruft tricks.

If you can see none of minus, underscore, plus and slash in your data, then you need to determine the two alternate characters; they'll be the ones that aren't in [A-Za-z0-9]. Then you'll need to experiment to see which order they need to be used in the 2nd arg of base64.b64decode()

Update 3: If your data is "company confidential":
(a) you should say so up front
(b) we can explore other avenues in understanding the problem, which is highly likely to be related to what characters are used instead of + and / in the encoding alphabet, or by other formatting or extraneous characters.

One such avenue would be to examine what non-"standard" characters are in your data, e.g.

from collections import defaultdict
d = defaultdict(int)
import string
s = set(string.ascii_letters + string.digits)
for c in your_data:
   if c not in s:
      d[c] += 1
print d

Related questions
                            
                                Django: reverse accessors for foreign keys clashing
                            
                                Is it pythonic to import inside functions?
                            
                                Split string based on regex
                            
                                Is there a python equivalent of Ruby's 'rvm'?
                            
                                How to identify whether a file is normal file or directory
                            
                                TypeError: can't use a string pattern on a bytes-like object in re.findall()
                            
                                Unittest setUp/tearDown for several tests
                            
                                Access data in package subdirectory [duplicate]
                            
                                Python Logging - Disable logging from imported modules
                            
                                How to get element-wise matrix multiplication (Hadamard product) in numpy?
                            
                                Keyboard Interrupts with python's multiprocessing Pool
                            
                                Logging within pytest tests
                            
                                Conda: Installing / upgrading directly from github
                            
                                How can I run an external command asynchronously from Python?
                            
                                Difference between BeautifulSoup and Scrapy crawler?
                            
                                Python Dictionary to URL Parameters
                            
                                How to override and extend basic Django admin templates?
                            
                                write() versus writelines() and concatenated strings
                            
                                Python Regex instantly replace groups
                            
                                seek() function?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python: Ignore 'Incorrect padding' error when base64 decoding

Tags:

python

base64

People also ask

Recent Activity

Donate For Us