Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data.Text vs. Rope

Tags:

haskell

I've been looking into ropes as an alternative to Data.Text, and I like what I see so much that I am now forced to ask this question.... Is there any case where Data.Text would be the better choice?

Here are the points that lead me to this (correct me if I am wrong on any of these)-

  1. A single leaf node rope internally is (almost) the same thing as a Data.Text object. The overhead in a single node rope vs. text is minor, just a single bit flag to differentiate a branch or leaf. If you really want Data.Text, just use an unsplit rope.

  2. Complexity is universally equal or better in ropes- insert/delete (log(N) vs N), get by index (log(N)/N depending on depth of tree vs N).

  3. I've read that the success of ropes proved to be a mixed bag in c, because performance was harmed by thread safety code. However these concerns shouldn't matter in immutable Haskell. In fact, it would seem to me that because of this, Haskell and ropes are ideal for each other.

Again, like in my previous similar questions, I am more interested in the abstract qualities of the structures, not the current situation (library usage, how hardened the code is, etc). If you rewrote the Haskell libraries tomorrow, would you substitute Data.Rope for Data.Text?

like image 453
jamshidh Avatar asked Dec 20 '13 22:12

jamshidh


People also ask

What is a rope data structure used for?

A Rope data structure is a tree data structure which is used to store or manipulate large strings in a more efficient manner. It allows for operations like insertion, deletion, search and random access to be executed faster and much more efficiently in comparison to a traditional String.

What is rope data structure?

Description. A rope is a binary tree where each leaf (end node) holds a string and a length (also known as a "weight"), and each node further up the tree holds the sum of the lengths of all the leaves in its left subtree.

What is cord tree?

A cord tree is a binary tree of strings. A node in this tree can be a leaf node or an internal node. An internal node has two children, a left child and a right child. It also has a length of all the children under it. A leaf nodes have a value and a length.


1 Answers

The "finger tree of packed arrays" seems like a good representation choice, although I would worry about constant overhead. Some effort with aggressive stream fussion and some other optimizations for short strings might fix this, but Data.Rope lacks these features. Right now Data.Rope is not really a Data.Text replacmenet.

  1. It mainly implements strings of bytes not strings of charachters--it replaces byteString not text. Unicode support is important.
  2. Despite Edward's amazing-ness, Rope is not nearly as mature. It hasn't had any contributions in a year, and is rarely used.
  3. Data.Text has a huge engineering effort behind it, is highly optimized, and is well known and well documented
like image 152
Philip JF Avatar answered Oct 11 '22 11:10

Philip JF