Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate CRC32 with Python to match online results?

Tags:

python

crc32

I'm trying to calculate/generate the CRC32 hash of some random strings using Python but they do not match the values I generate from online sources. Here is what I'm doing on my PC,

>>> import binascii >>> binascii.crc32('hello-world') -1311505829 

Another approach,

>>> import zlib >>> zlib.crc32('hello-world') -1311505829 

The fact that the above results are identical tells me that I'm calling the function correctly. But, if I go to the following online sources,

  • http://www.lammertbies.nl/comm/info/crc-calculation.html
  • http://crc32-checksum.waraxe.us/
  • http://www.md5calc.com/ (select CRC32B from drop-down)

For the string "hello-world" they all give the same value = b1d4025b

Does anyone know what I need to do, to get matching results?

As I was typing this question it occurred to me that I might need to convert my Python result to hex,

>>> hex(zlib.crc32('hello-world')) '-0x4e2bfda5' 

Unfortunately, that hasn't helped either. :(

like image 326
chronodekar Avatar asked May 07 '15 04:05

chronodekar


People also ask

What is CRC32 Python?

crc32 computes CRC-32 on binary text data. By definition, CRC-32 refers to the 32-bit checksum of any piece of data. This method is used to compute 32-bit checksum of provided data. This algorithm is not a general hash algorithm. It should only be used as a checksum algorithm.

How do I manually calculate CRC?

A CRC is pretty simple; you take a polynomial represented as bits and the data, and divide the polynomial into the data (or you represent the data as a polynomial and do the same thing). The remainder, which is between 0 and the polynomial is the CRC.

How many bits is CRC32?

CRC32 is an error-detecting function that uses a CRC32 algorithm to detect changes between source and target data. The CRC32 function converts a variable-length string into an 8-character string that is a text representation of the hexadecimal value of a 32 bit-binary sequence.

What is a CRC32 code?

CRC32 is a popular checksum algorithm used to detect data corruption. Multiple variants of the algorithm exist which have similar mathematical properties.


2 Answers

Python 2 (unlike py3) is doing a signed 32-bit CRC.

Those sites are doing an unsigned 32-bit CRC.

The values are the same otherwise, as you can see from this:

>>> 0x100000000 - 0xb1d4025b == 0x4e2bfda5 True 

One quick way to convert from 32-bit signed to 32-bit unsigned is:*

>>> -1311505829 % (1<<32) 2983461467 

Or, in hex:

>>> hex(-1311505829 % (1<<32)) '0xb1d4025b' 

& 0xFFFFFFFF or % 0x100000000 or & (2**32-1) or % (2**32) and so on are all equivalent ways to do the same bit-twiddling; it just comes down to which one you find most readable.


* This only works in languages that do floored integer division, like Python (-3 // 2 == -2); in languages that do truncated integer division, like Java (-3 / 2 == -1), you'll still end up with a negative number. And in languages that don't even require that division and mod go together properly, like C, all bets are off—but in C, you'd just cast the bytes to the type you want…

like image 68
abarnert Avatar answered Sep 21 '22 21:09

abarnert


zlib.crc32 documentation suggests using the following approach "to generate the same numeric value across all Python versions and platforms".

import zlib hex(zlib.crc32(b'hello-world') & 0xffffffff) 

The result is 0xb1d4025b as expected.

like image 28
Alexey Avatar answered Sep 21 '22 21:09

Alexey