Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What persistent data structures does Raku/Rakudo include?

Raku provides many types that are immutable and thus cannot be modified after they are created. Until I started looking into this area recently, my understanding was that these Types were not persistent data structures – that is, unlike the core types in Clojure or Haskell, my belief was that Raku's immutable types did not take advantage of structural sharing to allow for inexpensive copies. I thought that statement my List $new = (|$old-list, 42); literally copied the values in $old-list, without the data-sharing features of persistent data structures.

That description of my understanding is in the past tense, however, due to the following code:

my Array $a = do {
    $_ = [rand xx 10_000_000];
    say "Initialized an Array in $((now - ENTER now).round: .001) seconds"; $_}
my List $l = do {
    $_ = |(rand xx 10_000_000);
    say "Initialized the List in $((now - ENTER now).round: .001) seconds"; $_}
do { $a.push: rand;  
     say "Pushed the element to the Array in $((now - ENTER now).round: .000001) seconds" }
do { my $nl = (|$l, rand); 
     say "Appended an element to the List in $((now - ENTER now).round: .000001) seconds" }
do { my @na = |$l; 
     say "Copied List \$l into a new Array in $((now - ENTER now).round: .001) seconds" }

which produced this output in one run:

Initialized an Array in 5.938 seconds
Initialized the List in 5.639 seconds
Pushed the element to the Array in 0.000109 seconds
Appended an element to the List in 0.000109 seconds
Copied List $l into a new Array in 11.495 seconds

That is, creating a new List with the old values + one more is just as fast as pushing to a mutable Array, and dramatically faster than copying the List into a new Array – exactly the performance characteristics that you'd expect to see from a persistent List (copying to an Array is still slow because it can't take advantage of structural sharing without breaking the immutability of the List). The fast copying of $l into $nl is not due to either List being lazy; neither are.

All of the above leads me to believe that Lists in Rakudo actually are persistent data structures, with all the performance benefits that implies. That leaves me with several questions:

  • Am I right about Lists being persistent data structures?
  • Are all other immutable Types also persistent data structures? Or are any?
  • Is any of this part of Raku, or just an implementation choice Rakudo has made?
  • Are any of these performance characteristics documented/guaranteed anywhere?

I have to say, I am both extremely impressed and more than a bit baffled to discover evidence that at least some of Raku(do)'s types are persistent. It's the sort of feature that other languages list as a key selling point or that leads to the creation of libraries with 30k+ stars on GitHub. Have we really had it in Raku without even mentioning it?

like image 944
codesections Avatar asked Apr 10 '21 03:04

codesections


People also ask

What is persistent data structure with example?

A data structure is said to be persistent if it is capable to maintaining its previous updates as separate versions and each version can be accessed and updated accordingly. It makes the data structure immutable and thread safe. For example, String class object in Java is immutable.

What is persistent and Ephermal data structure?

∎ An ephemeral data structure is one for. which only one version is available at a time: after an update operation, the structure as it existed before the update is lost. ∎ A persistent structure is one where. multiple versions are simultaneously accessible: after an update, both old and new versions can be used.


Video Answer


1 Answers

I remember implementing these semantics, and I certainly don't recall thinking about them giving rise to a persistent data structure at the time - although it does seems fair to attach that label to the result!

I don't think you'll find anywhere that explicitly spells out this exact behavior, however the most natural implementation of things that are required by the language quite naturally leads to it. Taking the ingredients:

  • The infix:<,> operator is the List constructor in Raku
  • When a List is created, it is non-committal with regards to laziness and flattening (these arise from how we use the List, which we don't - in general - know at the point of its construction)
  • When we write (|$x, 1), the prefix:<|> operator constructs a Slip, which is a kind of List that should melt into its surrounding List. Thus what infix:<,> sees is a Slip and an Int.
  • Making the Slip melt into the result List immediately would mean making a commitment about eagerness, which List construction alone should not do. Thus the Slip and everything after it is placed into the lazily evaluated ("non-reified") portion of the List.

This last of these is what gives rise to the observed persistent data structure style behavior.

I expect it would be possible to have a implementation that inspects the Slip and chooses to eagerly copy things that are known not to be lazy, and still be in compliance with the specification test suite. That would change the time complexity of your example. If you want to be defensive against that, then:

do { my $nl = (|$l.lazy, rand); 
     say "Appended an element to the List in $((now - ENTER now).round: .000001) seconds" }

Should be sufficient to force the issue even if the implementation changed.

Of other cases that immediately come to mind that are related to persistent data structures or at least tail sharing:

  • The MoarVM implementation of strings, which is behind str and thus Str, implements string concatenation by creating a new string that refers to the two that are being concatenated instead of copying the data in the two strings (and does similar tricks for substr and repetition). This is strictly an optimization, not a language requirement, and in some delicate cases (the last grapheme of one string and the first grapheme of the next will form a single grapheme in the resulting string), it gives up and takes the copying path.
  • Outside of the core, modules like Concurrent::Stack, Concurrent::Queue, and Concurrent::Trie use tail sharing as a technique to implement relatively efficient lock-free data structures.
like image 145
Jonathan Worthington Avatar answered Oct 22 '22 22:10

Jonathan Worthington