Benchmarking shows that the cereal
library takes 100x longer to deserialize a data structure of mine (detailed below) than it takes to read the same data off the drive:
benchmarking Read
mean: 465.7050 us, lb 460.9873 us, ub 471.0938 us, ci 0.950
std dev: 25.79706 us, lb 22.19820 us, ub 30.81870 us, ci 0.950
found 4 outliers among 100 samples (4.0%)
4 (4.0%) high mild
variance introduced by outliers: 53.460%
variance is severely inflated by outliers
benchmarking Read + Decode
collecting 100 samples, 1 iterations each, in estimated 6.356502 s
mean: 68.85135 ms, lb 67.65992 ms, ub 70.05832 ms, ci 0.950
std dev: 6.134430 ms, lb 5.607914 ms, ub 6.755639 ms, ci 0.950
variance introduced by outliers: 74.863%
variance is severely inflated by outliers
This is also supported by profiling typical deserialization usage of this data structure in a program of mine where 98% of the time is spent deserializing the data and 1% is IO
plus the core algorithm:
COST CENTRE MODULE %time %alloc
getWord8 Data.Serialize.Get 30.5 40.4
unGet Data.Serialize.Get 29.5 17.9
getWord64be Data.Serialize.Get 14.0 10.7
getListOf Data.Serialize.Get 10.2 12.8
roll Data.Serialize 8.2 11.5
shiftl_w64 Data.Serialize.Get 3.4 2.9
decode Data.Serialize 2.9 3.1
main Main 1.3 0.6
The data structure I'm deserializing is an IntMap [Triplet Atom]
and the definitions of the component types are given below:
type Triplet a = (a, a, a)
data Point = Point {
_x :: {-# UNPACK #-} !Double ,
_y :: {-# UNPACK #-} !Double ,
_z :: {-# UNPACK #-} !Double }
data Atom = Atom {
_serial :: {-# UNPACK #-} !Int ,
_r :: {-# UNPACK #-} !Point ,
_n :: {-# UNPACK #-} !Word64 }
I'm using the default IntMap
, (,,)
and []
instances provided by cereal
, and the following types and instances for my custom types:
instance Serialize Point where
put (Point x y z) = do
put x
put y
put z
get = Point <$> get <*> get <*> get
instance Serialize Atom where
put (Atom s r n) = do
put s
put r
put n
get = Atom <$> get <*> get <*> get
So my questions are:
IntMap
/[]
) to make the deserialization go faster?Atom
/Point
) to make deserialization go faster?cereal
within Haskell, or should I store the data structure in C-land to do more rapid deserialization (i.e. use mmap
)?These files I am deserializing are being used for sub-indices for a search engine since the full index cannot fit in memory for the target computer (which is a consumer-grade desktop), so I store each sub-index on disk and read+decode the sub-indices pointed to by the initial global index residing in memory. Also, I'm not concerned about serialization speed since searching the index is the bottle-neck for the end user and the current serialization performance of cereal
is satisfactory for generating and updating the index.
Edit:
Tried out Don's suggestion of using a space-efficient triplet, and this quadrupled the speed:
benchmarking Read
mean: 468.9671 us, lb 464.2564 us, ub 473.8867 us, ci 0.950
std dev: 24.67863 us, lb 21.71392 us, ub 28.39479 us, ci 0.950
found 2 outliers among 100 samples (2.0%)
2 (2.0%) high mild
variance introduced by outliers: 50.474%
variance is severely inflated by outliers
benchmarking Read + Decode
mean: 15.04670 ms, lb 14.99097 ms, ub 15.10520 ms, ci 0.950
std dev: 292.7815 us, lb 278.8742 us, ub 308.1960 us, ci 0.950
variance introduced by outliers: 12.303%
variance is moderately inflated by outliers
However, it still remains the bottleneck using up 25x as much time as IO. Also, can anybody explain why Don's suggestion works? Does this mean if I switched to something other than a list (like an array?) that it might give an improvement, too?
Edit #2: Just switched to latest Haskell platform and reran profiling for cereal. The information is considerably more detailed and I've provided an hpaste of it.
Ok. To answer this with the summary of the advice. For fast deserialization of data:
cereal
(strict bytestring output) or binary
(lazy bytetring output)If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With