Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SHA1 encoding in Haskell

I have a list of filepaths and want all these files to store as sha1 encoded hash in a list again. It should be as general as possible, so the files could be text as well as binary files. And now my questions are:

  1. What packages should be used and why?
  2. How consistent is the approach? With that I mean: if there could be different results with different programs using sha1 for encoding itself (e.g. sha1sum)
like image 661
beyeran Avatar asked Feb 29 '12 16:02

beyeran


1 Answers

The cryptohash package is probably the simplest to use. Just read your input into a lazy1 ByteString and use the hashlazy function to get a ByteString with the resulting hash. Here's a small sample program which you can use to compare the output with that of sha1sum.

import Crypto.Hash.SHA1 (hashlazy)
import qualified Data.ByteString as Strict
import qualified Data.ByteString.Lazy as Lazy
import System.Process (system)
import Text.Printf (printf)

hashFile :: FilePath -> IO Strict.ByteString
hashFile = fmap hashlazy . Lazy.readFile 

toHex :: Strict.ByteString -> String
toHex bytes = Strict.unpack bytes >>= printf "%02x"

test :: FilePath -> IO ()
test path = do
  hashFile path >>= putStrLn . toHex
  system $ "sha1sum " ++ path
  return ()

Since this reads plain bytes, not characters, there should be no encoding issues and it should always give the same result as sha1sum:

> test "/usr/share/dict/words"
d6e483cb67d6de3b8cfe8f4952eb55453bb99116
d6e483cb67d6de3b8cfe8f4952eb55453bb99116  /usr/share/dict/words

This also works for any of the hashes supported by the cryptohash package. Just change the import to e.g. Crypto.Hash.SHA256 to use a different hash.

1 Using lazy ByteStrings avoids loading the entire file into memory at once, which is important when working with large files.

like image 61
hammar Avatar answered Oct 24 '22 08:10

hammar