I want to compute a checksum of all of the values of a column in aggregate.
In other words, I want to do some equivalent of
md5(group_concat(some_column))
The problem with this approach is:
(In case you're wondering, you can ensure that the concat of the values is in a consistent order, however, as believe it or not group_concat() accepts an order by clause within it, e.g. group_concat(some_column order by some_column)
)
MySQL offers the nonstandard bitwise aggregate functions BIT_AND(), BIT_OR() and BIT_XOR() which I presume would be useful for this problem. The column is numeric in this case but I would be interested to know if there was a way to do it with string columns.
For this particular application, the checksum does not have to be cryptologically safe.
It seems like you might as well use crc32
instead of md5
if you don't care about cryptographic strength. I think this:
select sum(crc32(some_column)) from some_table;
would work on strings. It might be inefficient as perhaps MySQL would create a temporary table (especially if you added an order by
).
The following query is used in Percona's Mysql Table Checksumming tool. Its a little tough to understand, but essentially it CRC32
s the column (or a bunch of columns concatted) for every row, then XOR
s them all together using the BIT_XOR
group function. If one crc hash is different, the result of XOR
ing everything will also be different. This happens in fixed memory, so you can checksum arbitrarily large tables.
SELECT CONV(BIT_XOR(CAST(CRC32(column) AS UNSIGNED)), 10, 16)
One thing to keep in mind though that this does not prevent possible collisions, and CRC32
is a pretty weak function by today's standards. A nicer hashing function would be something like the FNV_64
. It would be very unlikely to have two hashes which complement each other when XOR
ed together.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With