Last edit: I've figured out what the problem was (see my own answer below) but I cannot mark the question as answered, it would seem. If someone can answer the questions I have in my answer below, namely, is this a bug in Cython or is this Cython's intended behavior, I will mark that answer as accepted, because that would be the most useful lesson to gain from this, IMHO.
Firstly, I have to start by saying that I have been trying to figure this out for three days, and I am just banging my head against the wall. As best as I can tell from the documentation, I am doing things correctly. Obviously, I can't be doing things correctly, though, because if I were, I wouldn't have a problem (right?).
In any event, I am working on a binding for mcrypt to Python. It should work with both Python 2 and Python 3 (though it's untested for Python 2). It's available on my site, linked because it is way too large to include in the post, and given that I don't know what I am doing wrong, I cannot even isolate what might be the problem code. The script that shows the problem is also on my site. The script just feeds 100 blocks of nothing but the letter "a" (in whatever block size the encryption algorithm/encryption mode uses), and of course should get a block of "a" as the result of roundtripping. But it does not (always). Here is output from a single run of it:
Wed Dec 15 10:35:44 EST 2010
test.py:5: McryptSecurityWarning: get_key() is not recommended
return ''.join(['{:02x}'.format(x) for x in o.get_key()])
key: b'\x01ez\xd5\xa9\xf9\x1f)\xa0G\xd2\xf2Z\xfc{\x7fn\x02?,\x08\x1c\xc8\x03\x061X\xb5\xc9\x99\xd0\xca'
key: b'\x01ez\xd5\xa9\xf9\x1f)\xa0G\xd2\xf2Z\xfc{\x7fn\x02?,\x08\x1c\xc8\x03\x061X\xb5\xc9\x99\xd0\xca'
16
self test result: 0
enc parameters: {'salt': '6162636465666768', 'mode': 'cbc', 'algorithm': 'rijndael-128', 'iv': '61626364616263646162636461626364'}
dec parameters: {'salt': '6162636465666768', 'mode': 'cbc', 'algorithm': 'rijndael-128', 'iv': '61626364616263646162636461626364'}
enc key: 01657ad5a9f91f29a047d2f25afc7b7f6e023f2c081cc803063158b5c999d0ca
dec key: 01657ad5a9f91f29a047d2f25afc7b7f6e023f2c081cc803063158b5c999d0ca
Stats: 88 / 100 good packets (88.0%)
#5: b'aaaaaaaaaaaaaaaa' != b'\xa6\xb8\xf9\td\x8db\xf6\x00Y"ST\xc6\x9b\xe7'
#6: b'aaaaaaaaaaaaaaaa' != b'aaaaaaa1\xb3@\x8d\xff\xf9\xafpy'
#13: b'aaaaaaaaaaaaaaaa' != b'\xb9\xc8\xaf\x1f\xb8\x8c\x0b_\x15s\x9d\xecN,*w'
#14: b'aaaaaaaaaaaaaaaa' != b'aaaaaaaaaaaaa\xeb?\x13'
#49: b'aaaaaaaaaaaaaaaa' != b'_C\xf2\x15\xd5k\xe1XKIF5k\x82\xa4\xec'
#50: b'aaaaaaaaaaaaaaaa' != b'aaaaaaaaaaa+\xdf>\x01\xee'
#74: b'aaaaaaaaaaaaaaaa' != b'\x1c\xdf0\x05\xc7\x0b\xe9\x93H\xc5B\xd7\xcfj+\x03'
#75: b'aaaaaaaaaaaaaaaa' != b'aaaaaaaaaaaaw+\xed\x0f'
#79: b'aaaaaaaaaaaaaaaa' != b"\xf2\x89\x1ct\xe1\xeeBWo\xb4-\xb9\x085'\xef"
#80: b'aaaaaaaaaaaaaaaa' != b'aaaaaaaaaaa\xcc\x01n\xf0<'
#91: b'aaaaaaaaaaaaaaaa' != b'g\x02\x08\xbf\xa5\xd7\x90\xc1\x84D\xf3\x9d$a)\x06'
#92: b'aaaaaaaaaaaaaaaa' != b'aaaaaaaaaaaaaaa\x01'
The weird part is that it is exactly the same for a given (algorithm, mode) pair. I can change the algorithm and it will result in different round-trips, but always the same for every run when I don't change the algorithm. I'm absolutely stumped. Also, it's always two blocks in a row that are corrupt as you can see in the output above: blocks 5 and 6, 13 and 14, etc. So, there is a pattern but I am, for whatever reason, unable to figure out what that pattern is pointing to precisely.
I realize that I am probably asking a lot here: I can't isolate a small snip of code, and familiarity with both mcrypt and Python is probably required. Alas, after three days of hitting my head on this, I need to step away from the problem for a little bit, so I am posting this here in the hopes that maybe while I am taking a break from this problem either (a) someone will see where I introduced a bug, (b) I will be able to see my bug when I get back to the problem later, or (c) someone or myself can find the problem which maybe isn't a bug in my code but a bug in the binding process or the library itself.
One thing I haven't done is attempted to use another version of the mcrypt library. I'm doing my work with Cython 0.13, Python 3.1, and mcrypt 2.5.8, all as distributed by Ubuntu in Ubuntu 10.10 (except Cython, which I got from PyPi). But I manage systems with PHP applications that are functioning just fine and using mcrypt on Ubuntu 10.10 without data corruption, so I have no reason to believe that it is the build of mcrypt, so that just leaves… well, something wrong on my part somewhere, I think.
In any case, I thank anyone profusely who can help. I'm starting to feel like I'm going crazy because I've been working on this problem pretty much non-stop for days and I get the feeling that the solution is probably right in front of me, but I cannot see it.
Edit: Someone pointed out that I should be using memcpy instead of strncpy. I did that, but now, the test script shows that every block is incorrect. Color me even more confused than previously... here's the new output on pastebin.
Edit 2: I have come back to the computer and have been looking at it again, and I'm just adding print statements everywhere to find where things could be going wrong. The following code in the raw_encrypt.step(input) function:
cdef char* buffer = <char*>malloc(in_len)
print in_bin[:in_len]
memcpy(buffer, <const_void *>in_bin, in_len)
print "Before/after encryption"
print buffer[:in_len]
success = cmc.mcrypt_generic(self._mcStream, <void*>buffer, in_len)
print buffer[:in_len]
The first print statement shows the expected thing, the plaintext that is passed in. However, the second one shows something completely different, which it should be identical. It seems that there is something going on with Cython that I don't completely understand.
Oy, I hate to do this (answer my own question), but I found the answer: It is a quirk of Cython which I am going to have to look into (I don't know if it is an intended quirk, or if it is a bug).
The problem comes with the memcpy line. I cast the second parameter to <const_void*>, which matches the Cython definition in the pxd file, but apparently that makes Cython compile the code differently than using <char*>, the latter forcing Cython to pass a pointer to the actual bytes instead of (I guess?) a pointer to the Python object/variable itself.
So, instead of this:
cdef char* buffer = <char*>malloc(in_len)
memcpy(buffer, <const_void *>in_bin, in_len)
success = cmc.mcrypt_generic(self._mcStream, <void*>buffer, in_len)
It needs to be this:
cdef char* buffer = <char*>malloc(in_len)
memcpy(buffer, <char *>in_bin, in_len)
success = cmc.mcrypt_generic(self._mcStream, <void*>buffer, in_len)
What a strange quirk. I would honestly expect any cast to point to the same location, but it seems that the cast can affect behavior as well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With