Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Need an efficient way to turn (Seq Data.Text) to Data.Text

Tags:

haskell

I have a program (a SAX parser using Text.XML.Expat.SAX) that builds up very big CDATA nodes using repeated appends of Data.Text content, using Data.Sequence.(|>) like so:

existingText |> newTextChunk

This builds up a very big piece of data of type Seq Text.

After I've built up the data, I need to convert the Seq Text -> Text. But this solution I tried was super-slow:

Data.Foldable.foldr1 Data.Text.append seqText

Is there a faster way to turn a Sequence of Text into a plain Text datum?

Another way to ask this might be, what's the most efficient way to do merge a list of Text into one Text, i.e. [Text] -> Text?

like image 437
dan Avatar asked Mar 24 '23 13:03

dan


1 Answers

append will create a new array for every element in the list, and copy all data to it. As one of the comments said you might want to try concat. For sequence you could try doing:

import Data.Foldable (toList)
import Data.Sequence (Seq)
import qualified Data.Sequence as S
import Data.Text (Text)
import qualified Data.Text as T

concatSeq :: Seq Text -> Text
concatSeq = T.concat . toList

This should be faster than doing a fold with append, but I haven't verified it. You could try to whip up a small test case using criterion (which is an amazing library).

like image 173
dnaq Avatar answered Mar 26 '23 04:03

dnaq