Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is my python format %s taking no space?

Tags:

python

I'm working with PyCrypto, and I seem to be successfully decrypting my data. However, the string I receive seems to behave strangely:

...
plaintext = cipher.decrypt(encrypted)
print 'plaintext length is %u' % len(plaintext)
print 'plaintext: %s' % plaintext
print 'plaintext is "%s"' % plaintext

The plaintext has the string I expect ("POEorOPE"), but the output seems odd:

plaintext length is 16
plaintext: POEorOPE
plaintext is ""OEorOPE

Why does the string in the third print statement seem to take up zero space, and therefore have its first character overwritten by what I thought would be the closing quote? Is there something else going on here with what I now have stored in plaintext?

Edit:

Thanks for the comments, I see what's going on. (Though why I have backspace characters in my string I don't know.)

print repr(plaintext)

'POEorOPE\x08\x08\x08\x08\x08\x08\x08\x08'
like image 600
Ryan Olson Avatar asked Mar 23 '13 03:03

Ryan Olson


2 Answers

Turns out these backspace characters are byte padding added by Perl's Crypt::CBC module. In this particular case, the padding bytes were all "08" to indicate that there were 8 bytes of padding that should be removed. PyCrypto does not handle padding during decryption or encryption. I can strip the padding bytes like this:

text_bytes = bytearray(plaintext,'utf-8')
num_bytes_padding = text_bytes[len(text_bytes) - 1]
text_bytes[-1 * num_bytes_padding:] = []
plaintext = text_bytes.decode('utf-8') 
like image 109
Ryan Olson Avatar answered Sep 28 '22 05:09

Ryan Olson


Some (very old) software used a nifty trick to emulate bold text by "doubling a character": print the character, backspace, then print the character again. The duplication would produce a larger, darker glyph.

Your string should be 8 characters but is showing a len of 16. That is because 8 unicode code points (of "\x08") were added (probably as part of the decryption process).

The unicode point "\x08" stands indeed for the backspace. To illustrate that these are merely meaningless unicode points:

>>> u = u'POEorOPE\x08\x08\x08\x08\x08\x08\x08\x08'
>>> print u.encode()
POEorOPE
like image 43
kdeebee Avatar answered Sep 28 '22 04:09

kdeebee