Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the benefit of assigning non-scalars to scalars?

Tags:

raku

sigils

I sometimes see code that (to me) uses the wrong sigil in front of the variable

my $arr  = [1, 2, 3, 4, 5];      # an array
my $lst  = (1, 2, 3, 4, 5);      # a list
my $hash = {a => '1', b => '2'}; # a hash
my $func = -> $foo { say $foo }; # a callable

And it all just works exactly as expected

say $arr[0];    # 1
say $lst[1];    # 2
say $hash<a>;   # 1
say $hash{'b'}; # 2
$func('hello'); # hello

Q1: What are the benefits of using the scalar container for this, rather than just using the 'correct' one?

I know that Perl only let collections store scalars, requiring things like multi-dimensional arrays be done via array references, with [...] and {...} being array and hash reference literals respectively.

To expand and clarify what I mean here, there's basically two ways to define things, by value and by reference:

# "values"
my @arr = (1, 2, 3, 4);
my %hash = (1 => 2, 3 => 4); 

# which are accessed like this:
my $result1 = $arr[0];
my $result2 = $hash{1};

# references (note how the braces canged)
my $aref = [1, 2, 3, 4];
my $href = {1 => 2, 3 => 4};

# or making a reference to existing collections
my $aref2 = \@arr;
my $href2 = \%hash;

# which are accessed like this:
my $result3 = $aref->[0];
my $result4 = $href->{1};

The reasoning behind this madness is that Perl collections only really accept scalars, and references are just that. Using references is essentially a way to enable multidimensional arrays.

TL;DR, the distinction makes sense in Perl because they serve two distinctly different purposes.

Q2: Are we dealing with Perl 5-like reference literals again, or is something else at play?

like image 466
Electric Coffee Avatar asked Jun 01 '20 20:06

Electric Coffee


People also ask

What is a scalar in math?

A scalar is an element of a field which is used to define a vector space. A quantity described by multiple scalars, such as having both direction and magnitude, is called a vector. In linear algebra, real numbers or other elements of a field are called scalars and relate to vectors in a vector space through...

What is the difference between a vector and a scalar?

Vectors and scalars. Scalars have a size, while vectors have both size and direction. When adding vector quantities, it is possible to find the size and direction of the resultant vector by drawing a scale diagram. Vectors and scalars. Scalars. A scalar quantity can be described fully by stating its magnitude (size).

What is scalar multiplication in linear algebra?

In linear algebra, real numbers or other elements of a field are called scalars and relate to vectors in a vector space through the operation of scalar multiplication, in which a vector can be multiplied by a number to produce another vector.

What is the difference between scalars and modules?

When the requirement that the set of scalars form a field is relaxed so that it need only form a ring (so that, for example, the division of scalars need not be defined, or the scalars need not be commutative ), the resulting more general algebraic structure is called a module . In this case the "scalars" may be complicated objects.


Video Answer


2 Answers

Some great answers already! For even further interesting reading on this general topic, may I suggest Day 2 – Perl 6: Sigils, Variables, and Containers ? It helped me to understand some of the related topics such as scalars as containers and the decont op <>. I think the examples may give a bit more rationale on the interplay of $ and @/% to manage the subtleties of efficiently packing/unpacking data structures as intended.

like image 122
p6steve Avatar answered Oct 22 '22 04:10

p6steve


TL;DR For computers, and humans, and therefore Raku too, a non-scalar (plural thing) is also a scalar (singular thing). (Whereas the converse may not be true.) For example, an Array is both a plural thing (an array of elements) and a single thing, an Array. When you wish to syntactically and statically emphasize a datum's most generic singular nature, use $.

Here's an opening example based on @sid_com++'s comment:

my @a = ( 1, 2 ), 42, { :a, :b }
for @a -> $b {say $b}            # (1 2)␤42␤{a => True, b => True}␤ 
for @a -> @b {say @b}            # (1 2)␤Type check failed ...

The first loop binds values to $b. It is "fault tolerant" because it accepts any value. The second loop binds to @b. Any value that doesn't do the Positional role leads to a type check failure.

My Raku equivalent of your Perl code

Here's a Raku translation of your Perl code:

my @arr = (1, 2, 3, 4);
my %hash = (1 => 2, 3 => 4); 

my $result1 = @arr[0];                          # <-- Invariant sigil
my $result2 = %hash{1};                         # <-- Invariant sigil

my $aref = [1, 2, 3, 4];
my $href = {1 => 2, 3 => 4};

my $aref2 = @arr;                               # <-- Drop `\`
my $href2 = %hash;                              # <-- Drop `\`

my $result3 = $aref[0];                         # <-- Drop `->`
my $result4 = $href{1};                         # <-- Drop `->`

The code is a little shorter. Idiomatic code would probably be a good bit shorter still, dropping:

  • The ref variables. A variable @foo is a reference. A [...] in term (noun) position is an Array reference literal. There's little or no need to use scalar variables to explicitly store references.

  • The parens in the first couple lines;

  • Semi colons after most closing braces that are the last code on a line;

Raku's sigils are invariant. Here are two tables providing an at-a-glance comparison of Perl's sigil variation vs Raku's sigil invariance.

Why bother with sigils?

All the sigil variations directly correspond to embedding "type" info into an identifier's name that's visible to humans, the language, and the compiler:

  • foo Tells Raku features which pick between a singular and plural way of operating on data should decide based on the run-time type of the data.

  • $foo Tells Raku to pick singular behavior. A value might be, say, a List containing many values, but its singular nature is being emphasized instead.

  • &foo Type checks that a bound or assigned value does the Callable role.

  • @foo Tells Raku to pick Iterable behavior. Also type checks that bound values do the Positional role. A List or Array can be bound, but trying to bind to 42 or a Hash will yield a type error.

  • %foo Tells Raku to pick Iterable behavior. Also type checks that bound values do the Associative role. A Pair or Bag can be bound, but trying to bind to 42 or a List will yield a type error.

I'll next consider your question for each sigil alternative.

Slashing out sigils

Repeating your examples, but this time "slashing out" sigils:

my \arr  = [1, 2, 3, 4, 5];      # an array
my \lst  = (1, 2, 3, 4, 5);      # a list
my \hash = {a => '1', b => '2'}; # a hash
my \func = -> \foo { say foo };  # a callable

These almost just work exactly as expected:

say arr[0];     # 1
say lst[1];     # 2
say hash<a>;    # 1
say hash{'b'};  # 2
func.('hello'); # hello

See the $ vs & below for why it's func.(...) not just func(...). This last nosigil case is of little consequence because in Raku one normally writes:

sub func (\foo) { say foo }
func('hello'); # hello

Identifiers with their sigils slashed out are SSA form. That is to say, they are permanently bound, once, at compile-time, to their data. Value type data is immutable. A reference is also immutable (although the data it refers to can change), so for example, if it's an array, it will remain the same array.

(See Is there a purpose or benefit in prohibiting sigilless variables from rebinding? for further discussion.)

$foo instead of @foo?

Raku supports:

  • Lazy lists. (This can be very useful.)

  • A boolean .is-lazy method that indicates whether list assignment (@foo = ...) should treat an assigned object as lazy or eager. Importantly, a lazy list is allowed to return False. (This too can be very useful.)

  • Infinite lazy lists. (Yet another thing that can be very useful.)

The above three features are individually useful, and can be useful together. But while it is appropriate that Raku doesn't try to police these features other than the way it does, one needs to follow rules to avoid problems. And the simplest way to do that is to use the right sigil when it matters, as explained next.

Let's say infinite is an infinite lazy list that returns False for .is-lazy:

my $foo = infinite;
say $foo[10];        # displays 11th element
my @foo = infinite;

The first two lines work fine. The third hangs, trying to copy an infinite number of elements into @foo.


Is it one thing or many things? Of course, if it's a list, it's both:

my $list = <a b c> ;
my @list = <a b c> ;
my \list = <a b c> ;
.say for $list ;      # (a b c)␤   <-- Treat as one thing
.say for @list ;      # a␤b␤c␤    <-- Treat as plural thing
.say for  list ;      # a␤b␤c␤    <-- Go by bound value, not sigil

The choice of sigil in the above just indicates what view you want language constructs and readers to take by default. You can reverse yourself if you wish:

.say for @$list ;     # a␤b␤c␤
.say for $@list ;     # [a b c]␤
.say for $(list)      # (a b c)␤

Assignment is different:

my ($numbers, $letters) = (1, 2, 3), ('a', 'b', 'c');
say $numbers;                                            # (1 2 3)
say $letters;                                            # (a b c)
my (@numbers, @letters) = (1, 2, 3), ('a', 'b', 'c');
say @numbers;                                            # [(1 2 3) (a b c)]
say @letters;                                            # []

Assignment to an @ variable "slurps" all remaining arguments. (Binding with := and metaops like Z= invoke scalar semantics, i.e. don't slurp.)

We see another difference here; assigning to a $ variable is going to keep a List a List, but assigning to an @ variable "slurps" its values into whatever container the @ variable is bound to (by default, an Array).


A tiny thing is string interpolation:

my $list := 1, 2;
my @list := 1, 2;
say "\$list = $list; \@list = @list"; # $list = 1 2; @list = @list
say "@list @list[] @list[1]";         # @list 1 2 2

$foo instead of %foo?

Again, is it one thing or many things? If it's a hash, it's both.

my $hash = { :a, :b }
my %hash =   :a, :b ;
my \hash = { :a, :b }
.say for $hash ;      # {a => True, b => True}␤   <-- By sorted keys
.say for %hash ;      # {b => True}␤{a => True}␤  <-- Random order
.say for  hash ;      # {a => True}␤{b => True}␤  <-- Random order

Assignment and string interpolation are also different in a manner analogous to @.

$foo instead of &foo?

This section is just for completeness. It only shows one reason to use $. And I've just made it up for this answer -- I don't recall seeing anyone using it.

As with the other sigil alternatives, the primary difference would be whether you do or don't want to emphasize the Callable nature of a callable.

As the setup, note that a sub declaration in Raku declares a corresponding constant identifier with an & sigil:

sub foo (--> Int) { 42 }
say foo;                     # 42
say &foo.signature;          # ( --> Int)
&foo = 99;                   # Cannot modify an immutable Sub...

Which means that if you declare a mutable routine variable with the & sigil you can call it without the sigil:

my &bar = { 99 }
say bar;                     # 99
&bar = { 100 }
say bar;                     # 100

If you wanted to declare a mutable routine variable and not allow it to be easily called without a sigil you could declare it with $ instead:

my Callable $baz = { 101 }
say baz;                     # Undeclared routine: baz
say $baz();                  # 101   <-- Need both sigil and parens

Btw, this is why you get:

my \func = -> \foo { say foo }
func('hello');  # Variable '&func' is not declared

Reference literals

Q2: Are we dealing with Perl 5-like reference literals again, or is something else at play?

Despite your examples, knowing Perl (at least I did last century), and pondering what you've written, I'm still unclear what you're asking.

A wide range of programming languages adopt [...] in term (noun) position as a reference to a literal array. There are other common conventions for other data structure literals. This is what Raku does.

Thus one can write:

my $structure =
[ 0, [ 99, [ ( 1, 2, 3), { key => [ 4, 5, | < a b >, c => 42 ] } ], ], ] ;

say $structure[1][1][1]<key>[4]<c> ; # 42

Is that the sort of thing you're talking about?

Dereference literals

postcircumfix:< [ ] > is declared as a pile of multi subs that (are supposed to) apply a Positional consistent indexing protocol on their left argument.

  • All built in types that do the Positional role work.

  • User defined types that do the Positional role should work because the role defines typed interface stubs that must be implemented by types that do the role.

  • But ducktyping is also OK; provided a type implements the basics of the interface postcircumfix:< [ ] > it should work.

The same story applies for postcircumfix:< { } > and postcircumfix:« < > », but the relevant role/protocol is Associative consistent indexing.

And a similar story applies for postcircumfix:< ( ) > and Callable.

like image 23
raiph Avatar answered Oct 22 '22 04:10

raiph