I've been doing quite a bit of reading on Data.Text, but I haven't been able to find much in the way of when to prefer Strict over Lazy, or vice-versa. My understanding is that Data.Text.Strict is a data structure of contiguous characters in memory whereas Data.Text.Lazy is a chunks of contiguous characters. My question is why shouldn't I always use Data.Text.Lazy? It seems the only overhead is the chunk management, but I don't know if it's noticeable enough? In exchange, concatenation operations can be much cheaper when Text values become large. Thoughts and insights welcome!

I'd say that using <code>Data.Text.Lazy</code> inherits many of the problems of lazy <code>IO</code>. So my suggestion would be to prefer <code>Strict</code>, and if you need to process large pieces of data sequentially, use one of the available streaming libraries. See also What is pipes/conduit trying to solve.

From the docs: <h3>Data.Text.Lazy</h3> <blockquote> A time and space-efficient implementation of Unicode text using lists of packed arrays. This representation is suitable for high performance use and for streaming large quantities of data. It provides a means to manipulate a large body of text without requiring that the entire content be resident in memory. Some operations, such as concat, append, reverse and cons, have better complexity than their Data.Text equivalents, due to optimisations resulting from the list spine structure. And for other operations lazy Texts are usually within a few percent of strict ones, but with better heap usage. For data larger than available memory, or if you have tight memory constraints, this module will be the only option. </blockquote> <h3>Data.Text</h3> <blockquote> A time and space-efficient implementation of Unicode text using packed Word16 arrays. Suitable for performance critical use, both in terms of large data quantities and high speed. ... Most of the functions in this module are subject to fusion, meaning that a pipeline of such functions will usually allocate at most one Text value. </blockquote> So while <code>Data.Text</code> is sufficient for most purposes, <code>Data.Text.Lazy</code> is specifically for when you have very large amounts of data to process and can't practically hold it all in memory at once. <code>Data.Text</code> is somewhat more efficient in general, but which is better for your application is entirely dependent on your use case. A good rule of thumb is to start with strict, and if you're having memory or speed problems then try using lazy.

Haskell: Lazy vs. Strict Text values, which one is recommended when?

Tags:

text

haskell

I've been doing quite a bit of reading on Data.Text, but I haven't been able to find much in the way of when to prefer Strict over Lazy, or vice-versa.

My understanding is that Data.Text.Strict is a data structure of contiguous characters in memory whereas Data.Text.Lazy is a chunks of contiguous characters.

My question is why shouldn't I always use Data.Text.Lazy? It seems the only overhead is the chunk management, but I don't know if it's noticeable enough? In exchange, concatenation operations can be much cheaper when Text values become large.

Thoughts and insights welcome!

458

asked Jul 08 '14 21:07

Bjorg

2 Answers

I'd say that using Data.Text.Lazy inherits many of the problems of lazy IO. So my suggestion would be to prefer Strict, and if you need to process large pieces of data sequentially, use one of the available streaming libraries. See also What is pipes/conduit trying to solve.

126

answered Sep 23 '22 21:09

Petr

From the docs:

Data.Text.Lazy

A time and space-efficient implementation of Unicode text using lists of packed arrays. This representation is suitable for high performance use and for streaming large quantities of data. It provides a means to manipulate a large body of text without requiring that the entire content be resident in memory.

Some operations, such as concat, append, reverse and cons, have better complexity than their Data.Text equivalents, due to optimisations resulting from the list spine structure. And for other operations lazy Texts are usually within a few percent of strict ones, but with better heap usage. For data larger than available memory, or if you have tight memory constraints, this module will be the only option.

Data.Text

A time and space-efficient implementation of Unicode text using packed Word16 arrays. Suitable for performance critical use, both in terms of large data quantities and high speed.

...

Most of the functions in this module are subject to fusion, meaning that a pipeline of such functions will usually allocate at most one Text value.

So while Data.Text is sufficient for most purposes, Data.Text.Lazy is specifically for when you have very large amounts of data to process and can't practically hold it all in memory at once. Data.Text is somewhat more efficient in general, but which is better for your application is entirely dependent on your use case. A good rule of thumb is to start with strict, and if you're having memory or speed problems then try using lazy.

answered Sep 22 '22 21:09

bheklilr

Related questions
                            
                                Redefine IO to simplify debugging?
                            
                                Variadic compose function?
                            
                                Practical reasons for Church Encoding
                            
                                Multi-input, multi-output compilers with Shake
                            
                                :sprint for polymorphic values?
                            
                                If MonadPlus is the "generator" class, then what is the "consumer" class?
                            
                                Why is my little STRef Int require allocating gigabytes?
                            
                                Is my experience with setting up Haskell dev environment for the first time common or a one-off?
                            
                                Find the value that failed for quickcheck
                            
                                How to put constraints on the associated data?
                            
                                How unpacking strict fields goes together with polymorphism?
                            
                                When (and when not) to define a Monad
                            
                                How to build an AngularJS app with Yesod
                            
                                Is there a Codensity MonadPlus that asymptotically optimizes a sequence of MonadPlus operations?
                            
                                Why is this Haskell code so much slower than the C equivalent? Unboxed vectors and bangs already used
                            
                                How to upgrade GHC with Stack
                            
                                Why is the Haddock documentation not showing up on Hackage?
                            
                                How do you override Haskell type class instances provided by package code?
                            
                                How to write a simple HTTP server in Haskell using Network.HTTP.receiveHTTP
                            
                                Why isn't the Prelude's words function written more simply?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With