Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting the last element of a lazy Seq in Raku

I want to get the last element of a lazy but finite Seq in Raku, e.g.:

my $s = lazy gather for ^10 { take $_ };

The following don't work:

say $s[* - 1];
say $s.tail;

These ones work but don't seem too idiomatic:

say (for $s<> { $_ }).tail;
say (for $s<> { $_ })[* - 1];

What is the most idiomatic way of doing this while keeping the original Seq lazy?

like image 812
Nikola Benes Avatar asked Dec 04 '21 17:12

Nikola Benes


People also ask

Can We have infinite lists in raku?

These are called sequences, which are of type Seq. As it so happens, loops return Seq s. So, it is fine to have infinite lists in Raku, just so long as you never ask them for all their elements.

Can Raku be called on a consumed seq?

From version 6.d, .raku (again, .perl before version 2019.11) can be called on consumed Seq. If a program assumes a Seq can only iterate once, but then is later changed to call one of these operations during the loop, that assumption will fail.

What is the first element of a list in raku?

The first element of a list is at index number zero: Variables in Raku whose names bear the @ sigil are expected to contain some sort of list-like object. Of course, other variables may also contain these objects, but @ -sigiled variables always do, and are expected to act the part.

What is lazy-seq and how to use it?

lazy-seq is just one of many possible ways to create lazy sequences. And there are several other ways to do it in clojure. If a sequence is not lazy, it often holds onto it’s head, which consumes a lot of heap space. If it is lazy, it is computed, then discarded as it is not used for further computations.


Video Answer


2 Answers

What you're asking about ("get[ing] the last element of a lazy but finite Seq … while keeping the original Seq lazy") isn't possible. I don't mean that it's not possible with Raku – I mean that, in principle, it's not possible for any language that defines "laziness" the way Raku does with, for example, the is-lazy method.

If particular, when a Seq is lazy in Raku, that "means that [the Seq's] values are computed on demand and stored for later use." Additionally, one of the defining features of a lazy iterable is that it cannot know its own length while remaining lazy – that's why calling .elems on a lazy iterable throws an error:

my $s = lazy gather for ^10 { take $_ };
say $s.is-lazy; # OUTPUT: «True»
$s.elems;       # THROWS: «Cannot .elems a lazy list onto a Seq»

Now, at this point, you might reasonably be thinking "well, maybe Raku doesn't know how long $s is, but I can tell that it has exactly 10 elements in it." And you're not wrong – with that code, $s is indeed guaranteed to have 10 elements. This means that, if you want to get the tenth (last) element of $s, you can do so with $s[9]. And accessing $s's tenth element like that won't change the fact that $s.is-lazy.

But, importantly, you can only do so because you know something "extra" about $s, and that extra info undoes a good chunk of the reason you might want a list to be lazy in practice.

To see what I mean, consider a very similar Seq

my $s2 = lazy gather for ^10 { last if rand > .95; take $_ };
say $s2.is-lazy; # OUTPUT: «True»

Now, $s2probably has 10 elements, but it might not – the only way to know is to iterate through it and find out. In turn, this means $s2[9] does not jump to the tenth element the way $s[9] did; it iterates through $s2 just like you'd need to. And, as a result, if you run $s2[9], then $s2 will no longer be lazy (i.e., $s2.is-lazy will return False).

And this is, in effect, what you did in the code in your question:

my $s = lazy gather for ^10 { take $_ };
say $s.is-lazy;              # OUTPUT: «True»
say (for $s<> { $_ }).tail;  # OUTPUT: «9»
say $s.is-lazy;              # OUTPUT: «False»

Because Raku cannot ever know that it has reached the tail of a lazy Seq, the only way it could tell you the .tail is to fully iterate $s. And that necessarily means that $s is no longer lazy.

Two complications

It's worth mentioning two adjacent topics that aren't actually related but that are close enough that they trip some people up.

First, nothing I've said about lazy iterables not knowing their length precludes some non-lazy iterables from knowing their length. Indeed, a decent number of Raku types do both the Iterator role and the PredictiveIterator role – and the main point of a PredictiveIterator is that it does know how many elements it can produce without needing to produce/iterate them. But PredictiveIterators cannot be lazy.

The second potentially confusing topic is closely related to the first: while no PredictiveIterator can be lazy (that is, none will ever have an .is-lazy method that returns True), some PredictiveIterators have behavior that is very similar to laziness – and, in fact, may even be colloquially referred to as "lazy".

I can't do a great job explaining this distinction because, quite honestly, I don't fully understand it myself. But I can give you an example: the .lines method on an IO::Handle. It's certainly the case that reading the lines of a huge file behaves a lot like it's dealing with a lazy iterable. most obviously, you can process each line without ever having the whole file in memory. And the docs even say that "lines are read lazily" with the .lines method.

On the other hand:

my $l = 'some-file-with-100_000-lines.txt'.IO.lines;
say $l.is-lazy;                         # OUTPUT: «False»
say $l.iterator ~~ PredictiveIterator;  # OUTPUT: «True»
say $l.elems;                           # OUTPUT: «100000»

So I'm not quite sure whether it's fair to say that $l "is a lazy iterable", but if it is, it's "lazy" in a different way than $s was.

I realize that was a lot, but I hope it is helpful. If you have a more specific use case in mind for laziness (I bet it wasn't gathering the numbers from zero to nine!), I'd be happy to address that more specifically. And if anyone else can fill in some of the details with .lines and other lazy-not-lazy PredictiveIterators, I'd really appreciate it!

like image 138
codesections Avatar answered Oct 25 '22 02:10

codesections


Drop the lazy

Lazy sequences in Raku are designed to work well as is. You don't need to emphasize they're lazy by adding an explicit lazy.

If you add an explicit lazy, Raku interprets that as a request to block operations such as .tail because they will almost certainly immediately render laziness moot, and, if called on an infinite sequence, or even just a sufficiently large one, hang or OOM the program.

So, either drop the lazy, or don't invoke operations like .tail that will be blocked if you do.

Expanded version of my original answer

As noted by @ugexe, the idiomatic solution is to drop the lazy.

Quoting my answer to the SO About Laziness:

if a gather is asked if it's lazy, it returns False.

Aiui, something like the following applies:

  • Some lazy sequence producers may be actually or effectively infinite. If so, calling .tail etc on them will hang the calling program. Conversely, other lazy sequences perform fine when all their values are consumed in one go. How should Raku distinguish between these two scenarios?

  • A decision was made in 2015 to let value producing datatypes emphasize or deemphasize their laziness via their response to an .is-lazy call.

  • Returning True signals that a sequence is not only lazy but wants to be known to be lazy by consuming code that calls .is-lazy. (Not so much end-user code but instead built in consuming features such as @ sigilled variables handling an assignment trying to determine whether or not to assign eagerly.) Built in consuming features take a True as a signal they ought block calls like .tail. If a dev knows this is overly conservative, they can add an eager (or remove an unneeded lazy).

  • Conversely, a datatype, or even a particular object instance, may return False to signal that it does not want to be considered lazy. This may be because the actual behaviour of a particular datatype or instance is eager, but it might instead be that it is lazy technically, but doesn't want a consumer to block operations such as .tail because it knows they will not be harmful, or at least prefers to have that be the default presumption. If a dev knows better (because, say, it hangs the program), or at least does not want to block potentially problematic operations, they can add a lazy (or remove an unneeded eager).

I think this approach works well, but it doc and error messages mentioning "lazy" may not have caught up with the shift made in 2015. So:

  • If you've been confused by some doc about laziness, please search for doc issues with "lazy" in them, or "laziness", and add comments to existing issues, or file a new doc issue (perhaps linking to this SO answer).

  • If you've been confused by a Rakudo error message mentioning laziness, please search for Rakudo issues with "lazy" in them, and tagged [LTA] (which means "Less Than Awesome"), and add comments, or file a new Rakudo issue (with an [LTA] tag, and perhaps a link to this SO answer).

Further discussion

the docs ... say “If you want to force lazy evaluation use the lazy subroutine or method. Binding to a scalar or sigilless container will also force laziness.”

Yes. Aiui this is correct.

[which] sounds like it implies “my $x := lazy gather { ... } is the same as my $x := gather { ... }”.

No.

An explicit lazy statement prefix or method adds emphasis to laziness, and Raku interprets that to mean it ought block operations like .tail in case they hang the program.

In contrast, binding to a variable alters neither emphasis nor deemphasis of laziness, merely relaying onward whatever the bound producer datatype/instance has chosen to convey via .is-lazy.

not only in connection with gather but elsewhere as well

Yes. It's about the result of .is-lazy:

my $x =      (1, { .say; $_ + 1 } ... 1000);
my $y = lazy (1, { .say; $_ + 1 } ... 1000);

both act lazily ... but $x.tail is possible while $y.tail is not.

Yes.

An explicit lazy statement prefix or method forces the answer to .is-lazy to be True. This signals to a consumer that cares about the dangers of laziness that it should become cautious (eg rejecting .tail etc.).

(Conversely, an eager statement prefix or method can be used to force the answer to .is-lazy to be False, making timid consumers accept .tail etc calls.)

I take from this that there are two kinds of laziness in Raku, and one has to be careful to see which one is being used where.

It's two kinds of what I'll call consumption guidance:

  • Don't-tail-me If an object returns True from an .is-lazy call then it is treated as if it might be infinite. Thus operations like .tail are blocked.

  • You-can-tail-me If an object returns False from an .is-lazy call then operations like .tail are accepted.

It's not so much that there's a need to be careful about which of these two kinds is in play, but if one wants to call operations like tail, then one may need to enable that by inserting an eager or removing a lazy, and one must take responsibility for the consequences:

  • If the program hangs due to use of .tail, well, DIHWIDT.

  • If you suddenly consume all of a lazy sequence and haven't cached it, well, maybe you should cache it.

  • Etc.

What I would say is that the error messages and/or doc may well need to be improved.

like image 30
raiph Avatar answered Oct 25 '22 01:10

raiph