I've been doing quite a bit of reading on Data.Text, but I haven't been able to find much in the way of when to prefer Strict over Lazy, or vice-versa.
My understanding is that Data.Text.Strict is a data structure of contiguous characters in memory whereas Data.Text.Lazy is a chunks of contiguous characters.
My question is why shouldn't I always use Data.Text.Lazy? It seems the only overhead is the chunk management, but I don't know if it's noticeable enough? In exchange, concatenation operations can be much cheaper when Text values become large.
Thoughts and insights welcome!
Haskell is often described as a lazy language.
Haskell is a lazy language. It does not evaluate expressions until it absolutely must. This frequently allows our programs to save time by avoiding unnecessary computation, but they are at more of a risk to leak memory. There are ways of introducing strictness into our programs when we don't want lazy evaluation.
Strictness analysis Optimising compilers like GHC try to reduce the cost of laziness using strictness analysis, which attempts to determine which function arguments are always evaluated by the function, and hence can be evaluated by the caller instead.
Text. Strict is a data structure of contiguous characters in memory whereas Data. Text. Lazy is a chunks of contiguous characters.
I'd say that using Data.Text.Lazy
inherits many of the problems of lazy IO
. So my suggestion would be to prefer Strict
, and if you need to process large pieces of data sequentially, use one of the available streaming libraries. See also What is pipes/conduit trying to solve.
From the docs:
A time and space-efficient implementation of Unicode text using lists of packed arrays. This representation is suitable for high performance use and for streaming large quantities of data. It provides a means to manipulate a large body of text without requiring that the entire content be resident in memory.
Some operations, such as concat, append, reverse and cons, have better complexity than their Data.Text equivalents, due to optimisations resulting from the list spine structure. And for other operations lazy Texts are usually within a few percent of strict ones, but with better heap usage. For data larger than available memory, or if you have tight memory constraints, this module will be the only option.
A time and space-efficient implementation of Unicode text using packed Word16 arrays. Suitable for performance critical use, both in terms of large data quantities and high speed.
...
Most of the functions in this module are subject to fusion, meaning that a pipeline of such functions will usually allocate at most one Text value.
So while Data.Text
is sufficient for most purposes, Data.Text.Lazy
is specifically for when you have very large amounts of data to process and can't practically hold it all in memory at once. Data.Text
is somewhat more efficient in general, but which is better for your application is entirely dependent on your use case. A good rule of thumb is to start with strict, and if you're having memory or speed problems then try using lazy.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With