In my code, I need to hash files using a variety of algorithms, including CRC32. Since I'm also using other cryptographic hash functions in the Digest
family, I thought it would be nice to maintain a consistent interface for them all.
For the record, I did find digest-crc
, a gem which does exactly what I want. The thing is, Zlib
is part of the standard library and has a working implementation of CRC32 that I'd like to reuse. Also, it is written in C so it should offer superior performance in relation to digest-crc
, which is a pure-ruby implementation.
Implementing Digest::CRC32
actually looked pretty straightforward at first:
%w(digest zlib).each { |f| require f }
class Digest::CRC32 < Digest::Class
include Digest::Instance
def update(str)
@crc32 = Zlib.crc32(str, @crc32)
end
def initialize; reset; end
def reset; @crc32 = 0; end
def finish; @crc32.to_s; end
end
Everything looks right:
crc32 = File.open('Rakefile') { |f| Zlib.crc32 f.read }
digest = Digest::CRC32.file('Rakefile').digest!.to_i
crc32 == digest
=> true
Unfortunately, not everything works:
Digest::CRC32.file('Rakefile').hexdigest!
=> "313635393830353832"
# What I actually expected was:
Digest::CRC32.file('Rakefile').digest!.to_i.to_s(16)
=> "9e4a9a6"
hexdigest
basically returns Digest.hexencode(digest)
, which works with the value of the digest at the byte level. I'm not sure how that function works, so I was wondering if it is possible to achieve this with just the integer returned from Zlib.crc32
.
Digest is expecting digest to return the raw bytes that make up the checksum, i.e. in the case of a crc32 the 4 bytes that makeup that 32bit integer. However you are instead returning a string that contains the base 10 representation of that integer.
You want something like
[@crc32].pack('V')
to turn that integer into the bytes that represent that. Do go and read up on pack and its various format specifiers - there are lots of ways of packing an integer depending on whether the bytes should be presented in native endian-ness, big-endian, little-endian etc so you should figure out which one matches your needs
Sorry this doesn't really answer your question but it might help..
Firstly, when reading in a file, make sure you pass the "rb" parameter. I can see you're not on windows but if by chance your code does end up getting ran on a windows machine your code won't work the same, especially when reading ruby files in. Example:
crc32 = File.open('test.rb') { |f| Zlib.crc32 f.read }
#=> 189072290
digest = Digest::CRC32.file('test.rb').digest!.to_i
#=> 314435800
crc32 == digest
#=> false
crc32 = File.open('test.rb', "rb") { |f| Zlib.crc32 f.read }
#=> 314435800
digest = Digest::CRC32.file('test.rb').digest!.to_i
#=> 314435800
crc32 == digest
#=> true
The above will work across all platforms and all rubies.. that I know of.. But that's not what you asked..
I'm pretty sure the hexdigest and digest methods in your above example are working as they should though..
dig_file = Digest::CRC32.file('test.rb')
test1 = dig_file.hexdigest
#=> "333134343335383030"
test2 = dig_file.digest
#=> "314435800"
def hexdigest_to_digest(h)
h.unpack('a2'*(h.size/2)).collect {|i| i.hex.chr }.join
end
test3 = hexdigest_to_digest(test1)
#=> "314435800"
So I'm guessing either the .to_i.to_s(16) is throwing off your expected result or your expected result may possibly be wrong? Not sure, but all the best
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With