Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Haskell: should I use Data.Text.Lazy.Builder to construct my Text values?

Tags:

text

haskell

I'm working on a large application that constructs a lot of Data.Text values on the fly. I've been building all my Text values using (<>) and Data.Text.concat.

I only recently learned of the existence of the Builder type. The Beginning Haskell book has this to say about it:

Every time two elements are concatenated, a new Text value has to be created, and this comes with some overhead to allocate memory, to copy data, and also to keep track of the value and release it when it's no longer needed... Both the text and bytestring packages provide a Builder data type that can be used to efficiently generate large text values. [pg 240]

However, the book doesn't give any indication of exactly what is meant by "large text values."

So, I'm wondering whether or not I should refactor my code to use Builder. Maybe you can help me make that decision. Specifically, I have these questions:

1) Are there any guidelines or "best practices" regarding when one should choose Builder over concatenation? Or, how do I know that a given Text value is "large" enough that it merits using Builder?

2) Is using Builder a "no brainer," or would it be worthwhile doing some profiling to confirm its benefits before undertaking a large-scale refactoring?

Thanks!

like image 350
the-konapie Avatar asked Mar 17 '15 15:03

the-konapie


1 Answers

Data.Text.concat is an O(n+m) operation where n and m are the lengths of the strings you want to concat. This is because a new memory buffer of size n + m must be allocated to store the result of the concatenation.

Builder is specifically optimized for the mappend operation. It's a cheap O(1) operation (function composition, which is also excellently optimized by GHC). With Builder you are essentially building up the instructions for how to produce the final string result, but delaying the actual creation until you do some Builder -> Text transformation.

To answer your questions, you should choose Builder if you have profiled your application and discovered that Text.concat are dominating the run time. This will obviously depend on your needs and application. There is no general rule for when you should use Builder but for short Text literals there is probably no need.

Profiling would definitely be worthwhile if using Builder would involve "undertaking a large-scale refactoring". Although it goes without saying that Haskell will naturally make this kind of refactoring much less painful than you might be used to with less developer friendly languages, so it might not be such a difficult undertaking after all.

like image 91
cdk Avatar answered Sep 22 '22 23:09

cdk