Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Base64 Incorrect padding error using Python

Tags:

python

hex

base64

I am trying to decode Base64 into Hex for about 200 Base64 data and I am getting this following error. It does decoding for 60 of them then stops.

ABHvPdSaxrhjAWA=
0011ef3dd49ac6b8630160
ABHPdSaxrhjAWA=
Traceback (most recent call last):
  File "tt.py", line 36, in <module>
    csvlines[0] = csvlines[0].decode("base64").encode("hex")
  File "C:\Python27\lib\encodings\base64_codec.py", line 43, in base64_decode
    output = base64.decodestring(input)
  File "C:\Python27\lib\base64.py", line 325, in decodestring
    return binascii.a2b_base64(s)
binascii.Error: Incorrect padding

Some original Base64 source from CSV

ABHPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdS4xriiAVQ=
ABDPdSqxrizAU4=
ABDPdSrxrjPAUo=
like image 995
James Avatar asked Nov 21 '16 20:11

James


People also ask

What is incorrect padding in Python?

If there's a padding error it probably means your string is corrupted; base64-encoded strings should have a multiple of four length. You can try adding the padding character ( = ) yourself to make the string a multiple of four, but it should already have that unless something is wrong.

How do I add padding to Base64?

1) Add paddingDivide the length of the input string by 4, take the remainder. If it is 2, add two = characters at the end. If it is 3, add one = character at the end. You now have Base64-URL with padding.

Is Base64 padding required?

In some implementations, the padding character is mandatory, while for others it is not used. An exception in which padding characters are required is when multiple Base64 encoded files have been concatenated.

How do I encode Base64 in Python?

To convert a string into a Base64 character the following steps should be followed: Get the ASCII value of each character in the string. Compute the 8-bit binary equivalent of the ASCII values. Convert the 8-bit characters chunk into chunks of 6 bits by re-grouping the digits.


1 Answers

You have at least one string in your CSV file that is either not a Base64 string, is a corrupted (damaged) Base64 string, or is a string that is missing the required = padding. Your example value, ABHPdSaxrhjAWA=, is short one = or is missing another data character.

Base64 strings, properly padded, have a length that is a multiple of 4, so you can easily re-add the padding:

value = csvlines[0]
if len(value) % 4:
    # not a multiple of 4, add padding:
    value += '=' * (4 - len(value) % 4) 
csvlines[0] = value.decode("base64").encode("hex")

If the value then still fails to decode, then your input was corrupted or not valid Base64 to begin with.

For the example error, ABHPdSaxrhjAWA=, the above adds one = to make it decodable:

>>> value = 'ABHPdSaxrhjAWA='
>>> if len(value) % 4:
...     # not a multiple of 4, add padding:
...     value += '=' * (4 - len(value) % 4)
...
>>> value
'ABHPdSaxrhjAWA=='
>>> value.decode('base64')
'\x00\x11\xcfu&\xb1\xae\x18\xc0X'
>>> value.decode('base64').encode('hex')
'0011cf7526b1ae18c058'

I need to emphasise that your data may simply be corrupted. Your console output includes one value that worked, and one that failed. The one that worked is one character longer, and that's the only difference:

ABHvPdSaxrhjAWA=
ABHPdSaxrhjAWA=

Note the v in the 4th place; this is missing from the second example. This could indicate that something happened to your CSV data that caused that character to be dropped from the second example. Adding in padding can make the second value decodable again, but the result would be wrong. We can't tell you which of those two options is the cause here.

like image 147
Martijn Pieters Avatar answered Oct 02 '22 13:10

Martijn Pieters