Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently turn a ByteString into a hex representation

I needed to be able to give the hex representation of a SHA512 hash. Maybe I just didn't look hard enough, but I could find any functions on Hackage to do it. So I wrote an implementation using unfoldrN. It's definitely fast enough for my purposes, but I'm wondering if anyone knows of a faster approach.

I've put my implementation up on Github as a gist: https://gist.github.com/2356925 . The file also includes a simple implementation based on Numeric.showHex, a QuickCheck test and a criterion benchmark. My current results of the simple version versus the unfoldrN version are:

benchmarking simple
mean: 4.677296 ms, lb 4.656011 ms, ub 4.696684 ms, ci 0.950
std dev: 104.2791 us, lb 87.77023 us, ub 128.1627 us, ci 0.950
found 5 outliers among 100 samples (5.0%)
  4 (4.0%) low mild
variance introduced by outliers: 15.195%
variance is moderately inflated by outliers

benchmarking unfoldrN_MS1
mean: 370.0101 us, lb 365.9819 us, ub 373.8619 us, ci 0.950
std dev: 20.17016 us, lb 16.92772 us, ub 24.08982 us, ci 0.950
found 14 outliers among 100 samples (14.0%)
  7 (7.0%) low mild
  7 (7.0%) high mild
variance introduced by outliers: 52.467%
variance is severely inflated by outliers

Anyone want to take a stab at improving it?

like image 521
Michael Snoyman Avatar asked Apr 11 '12 04:04

Michael Snoyman


People also ask

How do you convert bytes to hexadecimal?

To convert byte array to a hex value, we loop through each byte in the array and use String 's format() . We use %02X to print two places ( 02 ) of Hexadecimal ( X ) value and store it in the string st . This is a relatively slower process for large byte array conversion.

What is hex encoded string?

Hex encoding is performed by converting the 8 bit data to 2 hex characters. The hex characters are then stored as the two byte string representation of the characters. Often, some kind of separator is used to make the encoded data easier for human reading.

How do you convert a hex string to a byte array?

To obtain a string in hexadecimal format from this array, we simply need to call the ToString method on the BitConverter class. As input we need to pass our byte array and, as output, we get the hexadecimal string representing it. string hexString = BitConverter. ToString(byteArray);


2 Answers

Going lower-level,

import Data.ByteString.Internal
import Foreign.Ptr
import Foreign.Storable
import qualified Data.ByteString as B
import Data.ByteString.Unsafe
import Data.Bits
import Data.Word

maxLen :: Int
maxLen = maxBound `quot` 2

hexDig :: Word8 -> Word8
hexDig d
    | d < 10    = d + 48
    | otherwise = d + 87

toHex :: ByteString -> ByteString
toHex bs
    | len > maxLen = error "too long to convert"
    | otherwise    = unsafeCreate nl (go 0)
      where
        len = B.length bs
        nl  = 2*len
        go i p
            | i == len  = return ()
            | otherwise = case unsafeIndex bs i of
                            w -> do poke p (hexDig $ w `shiftR` 4)
                                    poke (p `plusPtr` 1) (hexDig $ w .&. 0xF)
                                    go (i+1) (p `plusPtr` 2)

I could reduce the conversion time on my box by another factor of about 3.5. Making sample a bit longer (25000), I got

benchmarking simple
mean: 13.76532 ms, lb 13.64184 ms, ub 13.88680 ms, ci 0.950
std dev: 633.2413 us, lb 582.6342 us, ub 687.9701 us, ci 0.950
variance introduced by outliers: 44.438%
variance is moderately inflated by outliers

benchmarking unfoldrN_MS1
mean: 430.5705 us, lb 424.9206 us, ub 438.5689 us, ci 0.950
std dev: 33.85429 us, lb 26.25623 us, ub 45.74915 us, ci 0.950
found 4 outliers among 100 samples (4.0%)
  3 (3.0%) high mild
  1 (1.0%) high severe
variance introduced by outliers: 69.726%
variance is severely inflated by outliers

benchmarking LowHex
mean: 123.6000 us, lb 123.0551 us, ub 124.7084 us, ci 0.950
std dev: 3.837497 us, lb 1.869370 us, ub 6.470112 us, ci 0.950
found 6 outliers among 100 samples (6.0%)
  4 (4.0%) high mild
  2 (2.0%) high severe
variance introduced by outliers: 25.818%
variance is moderately inflated by outliers

For the original 500 long sample, it was

benchmarking simple
mean: 2.603306 ms, lb 2.583054 ms, ub 2.629212 ms, ci 0.950
std dev: 116.5341 us, lb 81.61409 us, ub 191.3293 us, ci 0.950
found 7 outliers among 100 samples (7.0%)
  2 (2.0%) low severe
  3 (3.0%) low mild
  1 (1.0%) high severe
variance introduced by outliers: 42.490%
variance is moderately inflated by outliers

benchmarking unfoldrN_MS1
mean: 83.19349 us, lb 82.88474 us, ub 83.58283 us, ci 0.950
std dev: 1.771460 us, lb 1.486104 us, ub 2.174729 us, ci 0.950
found 14 outliers among 100 samples (14.0%)
  12 (12.0%) high mild
  2 (2.0%) high severe
variance introduced by outliers: 14.225%
variance is moderately inflated by outliers

benchmarking LowHex
mean: 24.50564 us, lb 24.41683 us, ub 24.61241 us, ci 0.950
std dev: 497.1908 ns, lb 415.6366 ns, ub 609.7594 ns, ci 0.950
found 5 outliers among 100 samples (5.0%)
  5 (5.0%) high mild
variance introduced by outliers: 13.256%
variance is moderately inflated by outliers
like image 113
Daniel Fischer Avatar answered Oct 29 '22 17:10

Daniel Fischer


The function you are looking for is Data.ByteString.Builder.byteStringHex (or its twin function for Lazy ByteStrings), which is provided by the new bytestring builder. I extended your benchmarks and get the following results on my machine:

benchmarking size 5000/simple
mean: 2.469847 ms, lb 2.440422 ms, ub 2.522850 ms, ci 0.950
std dev: 196.5903 us, lb 116.8811 us, ub 318.4720 us, ci 0.950
found 16 outliers among 100 samples (16.0%)
  3 (3.0%) low severe
  2 (2.0%) low mild
  10 (10.0%) high severe
variance introduced by outliers: 70.721%
variance is severely inflated by outliers

benchmarking size 5000/unfoldrN_MS1
mean: 102.6075 us, lb 101.7695 us, ub 104.0159 us, ci 0.950
std dev: 5.468574 us, lb 3.681120 us, ub 8.080665 us, ci 0.950
found 16 outliers among 100 samples (16.0%)
  6 (6.0%) high mild
  10 (10.0%) high severe
variance introduced by outliers: 51.455%
variance is severely inflated by outliers

benchmarking size 5000/byteStringHexFixed
mean: 5.675204 us, lb 5.636296 us, ub 5.750211 us, ci 0.950
std dev: 264.3726 ns, lb 140.9738 ns, ub 398.8494 ns, ci 0.950
found 5 outliers among 100 samples (5.0%)
  4 (4.0%) high severe
variance introduced by outliers: 44.476%
variance is moderately inflated by outliers

I like this number. Too bad that my patches to the bytestring library are still under review by Duncan Coutts. At the latest the new bytestring builder will be available with the next GHC release.

like image 42
meiersi Avatar answered Oct 29 '22 17:10

meiersi