Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a convenient way to replicate R's concept of 'named vectors' in Raku, possibly using Mixins?

Recent questions on StackOverflow pertaining to Mixins in Raku have piqued my interest as to whether Mixins can be applied to replicate features present in other programming languages.

For example, in the R-programming language, elements of a vector can be given a name (i.e. an attribute), which is very convenient for data analysis. For an excellent example see: "How to Name the Values in Your Vectors in R" by Andrie de Vries and Joris Meys, who illustrate this feature using R's built-in islands dataset. Below is a more prosaic example (code run in the R-REPL):

> #R-code
> x <- 1:4
> names(x) <- LETTERS[1:4]
> str(x)
 Named int [1:4] 1 2 3 4
 - attr(*, "names")= chr [1:4] "A" "B" "C" "D"
> x
A B C D 
1 2 3 4 
> x[1]
A 
1 
> sum(x)
[1] 10

Below I try to replicate R's 'named-vectors' using the same islands dataset used by de Vries and Meys. While the script below runs and (generally, see #3 below) produces the desired/expected output, I'm left with three main questions, at bottom:

#Raku-script below;

put "Read in data.";

my $islands_A = <11506,5500,16988,2968,16,184,23,280,84,73,25,43,21,82,3745,840,13,30,30,89,40,33,49,14,42,227,16,36,29,15,306,44,58,43,9390,32,13,29,6795,16,15,183,14,26,19,13,12,82>.split(","); #Area

my $islands_N = <<"Africa" "Antarctica" "Asia" "Australia" "Axel Heiberg" "Baffin" "Banks" "Borneo" "Britain" "Celebes" "Celon" "Cuba" "Devon" "Ellesmere" "Europe" "Greenland" "Hainan" "Hispaniola" "Hokkaido" "Honshu" "Iceland" "Ireland" "Java" "Kyushu" "Luzon" "Madagascar" "Melville" "Mindanao" "Moluccas" "New Britain" "New Guinea" "New Zealand (N)" "New Zealand (S)" "Newfoundland" "North America" "Novaya Zemlya" "Prince of Wales" "Sakhalin" "South America" "Southampton" "Spitsbergen" "Sumatra" "Taiwan" "Tasmania" "Tierra del Fuego" "Timor" "Vancouver" "Victoria">>; #Name

"----".say;

put "Count elements (Area): ", $islands_A.elems; #OUTPUT 48
put "Count elements (Name): ", $islands_N.elems; #OUTPUT 48

"----".say;

put "Create 'named vector' array (and output):\n";
my @islands;
my $i=0;
for (1..$islands_A.elems) { 
    @islands[$i] := $islands_A[$i] but $islands_N[$i].Str;
    $i++;
};

say "All islands (returns Area): ",     @islands;             #OUTPUT: returns 48 areas (above)
say "All islands (returns Name): ",     @islands>>.Str;       #OUTPUT: returns 48 names (above)
say "Islands--slice (returns Area): ",  @islands[0..3];       #OUTPUT: (11506 5500 16988 2968)
say "Islands--slice (returns Name): ",  @islands[0..3]>>.Str; #OUTPUT: (Africa Antarctica Asia Australia)
say "Islands--first (returns Area): ",  @islands[0];          #OUTPUT: 11506
say "Islands--first (returns Name): ",  @islands[0]>>.Str;    #OUTPUT: (Africa)

put "Islands--first (returns Name): ",  @islands[0];          #OUTPUT: Africa
put "Islands--first (returns Name): ",  @islands[0]>>.Str;    #OUTPUT: Africa
  1. Is there a simpler way to write the Mixin loop ...$islands_A[$i] but $islands_N[$i].Str;? Can the loop be obviated entirely?

  2. Can a named-vector or nvec wrapper be written around put that will return (name)\n(value) in the same manner that R does, even for single elements? Might Raku's Pair method be useful here?

  3. Related to #2 above, calling put on the single-element @islands[0] returns the name Africa not the Area value 11506. [Note this doesn't happen with the call to say]. Is there any simple code that can be implemented to ensure that put always returns (numeric) value or always returns (Mixin) name for all-lengthed slices of an array?

like image 932
jubilatious1 Avatar asked Apr 03 '21 00:04

jubilatious1


People also ask

How to replicate elements of a vector in R?

In the R programming language, A very useful function for creating a vector by repeating a given numbervector with the specified number of times is the rep(). The general structure of rep() : rep(v1,n1). Here, v1 is repeated n1 times. R – replicate elements of vector

How to convert a list to vector in R language?

Converting a List to Vector in R Language - unlist() Function Improve Article Save Article Like Article Replicate elements of vector in R programming – rep() Method Last Updated :21 Dec, 2021 In the R programming language, A very useful function for creating a vector by repeating a given numbervector with the specified number of times is the rep().

How to create a vector by repeating a number vector?

In the R programming language, A very useful function for creating a vector by repeating a given numbervector with the specified number of times is the rep(). The general structure of rep() : rep(v1,n1).

How do you replicate an expression multiple times in R?

replicate () function in R. The replicate () function can be used for creating simulations as it can repeat an expression a specific number of times. We can also control the type of the final result as an array or list using the simplify parameter.


3 Answers

  1. Is there a simpler way? Yes using the zip meta operator Z combined with infix but

    my @islands = $islands_A[] Z[but] $islands_N[];
    
  2. Why don't you modify the array to change the format?

  3. put calls .Str on the value it gets, say calls .gist

If you want put to output some specific text, make sure that the .Str method outputs that text.

I don't think you actually want put to output that format though. I think you want say to output that format. That is because say is for humans to understand, and you want it nicer for humans.


When you have a question of “Can Raku do X” the answer is invariable yes, it's just a matter of how much work would it be, and if you would still call it Raku at that point.

The question you really want to ask is how easy it is to do X.


I went and implemented something like that link you provided talks about.

Note that this was just a quick implementation that I created right before bed. So think of this as a first rough draft.

If I were actually going to do this for-real, I would probably throw this away and start over after spending days learning enough R to figure out what it is actually doing.

class NamedVec does Positional does Associative {
  has @.names is List;
  has @.nums is List handles <sum>;
  has %!kv is Map;

  class Partial {
    has $.name;
    has $.num;
  }

  submethod TWEAK {
    %!kv := %!kv.new: @!names Z=> @!nums;
  }

  method from-pairlist ( +@pairs ) {
    my @names;
    my @nums;
    for @pairs -> (:$key, :$value) {
      push @names, $key;
      push @nums, $value;
    }
    self.new: :@names, :@nums
  }

  method from-list ( +@list ){
    my @names;
    my @nums;
    for @list -> (:$name, :$num) {
      push @names, $name;
      push @nums, $num;
    }
    self.new: :@names, :@nums
  }

  method gist () {
    my @widths = @!names».chars Zmax @!nums».chars;
    sub infix:<fmt> ( $str, $width is copy ){
      $width -= $str.chars;
      my $l = $width div 2;
      my $r = $width - $l;
      (' ' x $l) ~ $str ~ (' ' x $r)
    }
    (@!names Zfmt @widths) ~ "\n" ~ (@!nums Zfmt @widths)
  }

  method R-str () {
    chomp qq :to/END/
    Named num [1:@!nums.elems()] @!nums[]
     - attr(*, "names")= chr [1:@!names.elems()] @!names.map(*.raku)
    END
  }

  method of () {}
  method AT-POS ( $i ){
    Partial.new: name => @!names[$i], num => @!nums[$i]
  }
  method AT-KEY ( $name ){
    Partial.new: :$name, num => %!kv{$name}
  }
}

multi sub postcircumfix:<{ }> (NamedVec:D $v, Str:D $name){
  $v.from-list: callsame
}
multi sub postcircumfix:<{ }> (NamedVec:D $v, List \l){
  $v.from-list: callsame
}
 

my $islands_A = <11506,5500,16988,2968,16,184,23,280,84,73,25,43,21,82,3745,840,13,30,30,89,40,33,49,14,42,227,16,36,29,15,306,44,58,43,9390,32,13,29,6795,16,15,183,14,26,19,13,12,82>.split(","); #Area
my $islands_N = <<"Africa" "Antarctica" "Asia" "Australia" "Axel Heiberg" "Baffin" "Banks" "Borneo" "Britain" "Celebes" "Celon" "Cuba" "Devon" "Ellesmere" "Europe" "Greenland" "Hainan" "Hispaniola" "Hokkaido" "Honshu" "Iceland" "Ireland" "Java" "Kyushu" "Luzon" "Madagascar" "Melville" "Mindanao" "Moluccas" "New Britain" "New Guinea" "New Zealand (N)" "New Zealand (S)" "Newfoundland" "North America" "Novaya Zemlya" "Prince of Wales" "Sakhalin" "South America" "Southampton" "Spitsbergen" "Sumatra" "Taiwan" "Tasmania" "Tierra del Fuego" "Timor" "Vancouver" "Victoria">>; 

# either will work
#my $islands = NamedVec.from-pairlist( $islands_N[] Z=> $islands_A[] );
my $islands = NamedVec.new( names => $islands_N, nums => $islands_A );

put $islands.R-str;

say $islands<Asia Africa Antarctica>;

say $islands.sum;
like image 64
Brad Gilbert Avatar answered Nov 11 '22 16:11

Brad Gilbert


A named vector essentially combines a vector with a map from names to integer positions and allows you to address elements by name. Naming a vector alters the behavior of the vector, not that of its elements. So in Raku we need to define a role for an array:

role Named does Associative {
    has $.names;
    has %!index;

    submethod TWEAK {
        my $i = 0;
        %!index = map { $_ => $i++ }, $!names.list;
    }

    method AT-KEY($key) {
        with %!index{$key} { return-rw self.AT-POS($_) }
        else { self.default }
    }

    method EXISTS-KEY($key) {
        %!index{$key}:exists;
    }

    method gist() {
        join "\n", $!names.join("\t"), map(*.gist, self).join("\t");
    }
}

multi sub postcircumfix:<[ ]>(Named:D \list, \index, Bool() :$named!) {
    my \slice = list[index];
    $named ?? slice but Named(list.names[index]) !! slice;
}

multi sub postcircumfix:<{ }>(Named:D \list, \names, Bool() :$named!) {
    my \slice = list{names};
    $named ?? slice but Named(names) !! slice;
}

Mixing in this role gives you most of the functionality of an R named vector:

my $named = [1, 2, 3] but Named<first second last>;
say $named;                 # OUTPUT: «first␉second␉last␤1␉2␉3␤»
say $named[0, 1]:named;     # OUTPUT: «first␉second␤1␉2␤»
say $named<last> = Inf;     # OUTPUT: «Inf␤»
say $named<end>:exists;     # OUTPUT: «False␤»
say $named<last end>:named; # OUTPUT: «last␉end␤Inf␉(Any)␤»

As this is just a proof of concept, the Named role doesn't handle the naming of non-existing elements well. It also doesn't support modifying a slice of names. It probably does support creating a pun that can be mixed into more than one list.

Note that this implementation relies on the undocumented fact that the subscript operators are multis. If you want to put the role and operators in a separate file, you probably want to apply the is export trait to the operators.

like image 34
dumarchie Avatar answered Nov 11 '22 14:11

dumarchie


It might not be the most optimal way of doing it (or what you're specifically looking for) but as soon as I saw this particular problem's statement, the first thing that came to mind were Raku's allomorphs, which are types with two related values that are accessible separately depending on context.

my $areas = (11506,5500,16988,2968,16,184,23,280,84,73,25,43,21,82,3745,840,13,30,30,89,40,33,49,14,42,227,16,36,29,15,306,44,58,43,9390,32,13,29,6795,16,15,183,14,26,19,13,12,82);
my $names = <"Africa" "Antarctica" "Asia" "Australia" "Axel Heiberg" "Baffin" "Banks" "Borneo" "Britain" "Celebes" "Celon" "Cuba" "Devon" "Ellesmere" "Europe" "Greenland" "Hainan" "Hispaniola" "Hokkaido" "Honshu" "Iceland" "Ireland" "Java" "Kyushu" "Luzon" "Madagascar" "Melville" "Mindanao" "Moluccas" "New Britain" "New Guinea" "New Zealand (N)" "New Zealand (S)" "Newfoundland" "North America" "Novaya Zemlya" "Prince of Wales" "Sakhalin" "South America" "Southampton" "Spitsbergen" "Sumatra" "Taiwan" "Tasmania" "Tierra del Fuego" "Timor" "Vancouver" "Victoria">;

my @islands;

for (0..^$areas) -> \i {
    @islands[i] := IntStr.new($areas[i], $names[i]);   
}

say "Areas: ",       @islands>>.Int;
say "Names: ",       @islands>>.Str;
say "Areas slice: ", (@islands>>.Int)[0..3];
say "Names slice: ", (@islands>>.Str)[0..3];
say "Areas first: ", (@islands>>.Int)[0];
say "Names first: ", (@islands>>.Str)[0];
like image 20
Luis F. Uceta Avatar answered Nov 11 '22 15:11

Luis F. Uceta