Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently creating strict ByteStrings

Tags:

haskell

Recently after running benchmarks on my project I've discovered that direct construction of strict bytestrings can be an order of magnitude faster than the one involving a builder.

E.g., an encoder implementation, which uses a builder:

encoder :: Int64 -> Data.ByteString.ByteString
encoder =
  Data.ByteString.Lazy.toStrict .
  Data.ByteString.Builder.toLazyByteString .
  Data.ByteString.Builder.int64BE

performs like 10 times worse than the one, which constructs the bytestring directly, and has several possibilities for further optimization:

encoder :: Int64 -> Data.ByteString.ByteString
encoder =
  unpackIntBySize 8

unpackIntBySize :: (Bits a, Integral a) => Int -> a -> Data.ByteString.ByteString
unpackIntBySize n x =
  Data.ByteString.pack $ map f $ reverse [0..n - 1]
  where
    f s =
      fromIntegral $ shiftR x (8 * s)

So my question is a two-fold:

  1. Why is there no direct conversion from Builder to strict ByteString? It's annoying, because I often have to import Data.ByteString.Lazy just to use its toStrict function, because Data.ByteString.Builder exposes only toLazyByteString.

  2. The mentioned experience however made me wonder, if it's not there for a reason. With the reason being that I'm applying an incorrect pattern of usage altogether. So, is it indeed incorrect and is there a better alternative? BTW, I know about Data.ByteString.Builder.Prim, but I doubt that using it in a case as above, would make much difference.

like image 222
Nikita Volkov Avatar asked Oct 18 '15 08:10

Nikita Volkov


2 Answers

Builder is not a zero-cost abstraction, it is optimized for large lazy strings. From the builder docs:

The current implementation is tuned for an average chunk size between 4kb and 32kb

In your case, builder allocates whole 4k chunk just to produce 8 bytes.

Compare with pack, which calculates necessary buffer size, allocates it and then fills it in a loop. The only source of inefficiency is a list of 8 Word8 allocated upfront. Probably unfoldrN will be even more efficient.

Using builder to construct small strict bytestrings is sometimes convenient, but there are better ways.

like image 164
Yuras Avatar answered Oct 11 '22 11:10

Yuras


Try using toLazyByteStringWith from Data.ByteString.Builder.Extra to tune your ByteString construction. This takes an AllocationStrategy that lets you tune the buffer size and growth rate.

like image 28
Gabriella Gonzalez Avatar answered Oct 11 '22 10:10

Gabriella Gonzalez