Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Digest::CRC32 with Zlib

Tags:

ruby

digest

In my code, I need to hash files using a variety of algorithms, including CRC32. Since I'm also using other cryptographic hash functions in the Digest family, I thought it would be nice to maintain a consistent interface for them all.

For the record, I did find digest-crc, a gem which does exactly what I want. The thing is, Zlib is part of the standard library and has a working implementation of CRC32 that I'd like to reuse. Also, it is written in C so it should offer superior performance in relation to digest-crc, which is a pure-ruby implementation.

Implementing Digest::CRC32 actually looked pretty straightforward at first:

%w(digest zlib).each { |f| require f }

class Digest::CRC32 < Digest::Class
  include Digest::Instance

  def update(str)
    @crc32 = Zlib.crc32(str, @crc32)
  end

  def initialize; reset; end
  def reset; @crc32 = 0; end
  def finish; @crc32.to_s; end
end

Everything looks right:

crc32 = File.open('Rakefile') { |f| Zlib.crc32 f.read }
digest = Digest::CRC32.file('Rakefile').digest!.to_i
crc32 == digest
=> true

Unfortunately, not everything works:

Digest::CRC32.file('Rakefile').hexdigest!
=> "313635393830353832"

# What I actually expected was:
Digest::CRC32.file('Rakefile').digest!.to_i.to_s(16)
=> "9e4a9a6"

hexdigest basically returns Digest.hexencode(digest), which works with the value of the digest at the byte level. I'm not sure how that function works, so I was wondering if it is possible to achieve this with just the integer returned from Zlib.crc32.

like image 228
Matheus Moreira Avatar asked Dec 21 '11 18:12

Matheus Moreira


2 Answers

Digest is expecting digest to return the raw bytes that make up the checksum, i.e. in the case of a crc32 the 4 bytes that makeup that 32bit integer. However you are instead returning a string that contains the base 10 representation of that integer.

You want something like

[@crc32].pack('V')

to turn that integer into the bytes that represent that. Do go and read up on pack and its various format specifiers - there are lots of ways of packing an integer depending on whether the bytes should be presented in native endian-ness, big-endian, little-endian etc so you should figure out which one matches your needs

like image 55
Frederick Cheung Avatar answered Sep 18 '22 09:09

Frederick Cheung


Sorry this doesn't really answer your question but it might help..

Firstly, when reading in a file, make sure you pass the "rb" parameter. I can see you're not on windows but if by chance your code does end up getting ran on a windows machine your code won't work the same, especially when reading ruby files in. Example:

crc32 = File.open('test.rb') { |f| Zlib.crc32 f.read }
#=> 189072290
digest = Digest::CRC32.file('test.rb').digest!.to_i
#=> 314435800
crc32 == digest
#=> false

crc32 = File.open('test.rb', "rb") { |f| Zlib.crc32 f.read }
#=> 314435800
digest = Digest::CRC32.file('test.rb').digest!.to_i
#=> 314435800
crc32 == digest
#=> true

The above will work across all platforms and all rubies.. that I know of.. But that's not what you asked..

I'm pretty sure the hexdigest and digest methods in your above example are working as they should though..

dig_file = Digest::CRC32.file('test.rb')

test1 = dig_file.hexdigest
#=> "333134343335383030"

test2 = dig_file.digest
#=> "314435800"

def hexdigest_to_digest(h)
  h.unpack('a2'*(h.size/2)).collect {|i| i.hex.chr }.join
end

test3 = hexdigest_to_digest(test1)
#=> "314435800"

So I'm guessing either the .to_i.to_s(16) is throwing off your expected result or your expected result may possibly be wrong? Not sure, but all the best

like image 32
2potatocakes Avatar answered Sep 18 '22 09:09

2potatocakes