Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split on Split overlap [ RAKU ]

Tags:

split

raku

I stumbled upon a "strange" behaviour when I tried to parse a multiline string (with tab separated strings) with the intent to find all values separated by tabs, using two splits in a row:

use v6.d;   # 2020.01 release

my $s = "L1:C1\tL1:C2\tL1:C3\nL2:C1\tL2:C2\tL2:C3\nL3:C1\tL3:C2\tL3:C3";

say $s.split(/\n/).split(/\t/).raku;

and the corresponding printout is as follows:

("L1:C1", "L1:C2", "L1:C3 L2:C1", "L2:C2", "L2:C3 L3:C1", "L3:C2", "L3:C3").Seq

The "strange" behaviour is in the 3d and 5th member of the resulting sequence. It seems that there is an overlap of the "expected" last string of a line and the first string of the subsequent line.

My expectation was something like :

("L1:C1", "L1:C2", "L1:C3", "L2:C1", "L2:C2", "L2:C3", "L3:C1", "L3:C2", "L3:C3").Seq

Anybody to give a detailed explanation to the innerworkings of this behaviour?

Just to clarify things, I know that the correct code is:

$s.split(/\n/)>>.split(/\t/).flat.raku

but my question is about the innerworkings of the "wrong" code. How did Raku come to that result?

like image 696
jakar Avatar asked Feb 13 '20 09:02

jakar


2 Answers

You are splitting the result of the first split, which is a list; the split method will coerce whatever it's called on to a string and then splits it. A list will stringify (via its Str method) to its members separated by single spaces. That is the reason why some of the resulting fields have two L and C pairs and a space in between.

This will get you the result you want:

say "L1:C1\tL1:C2\tL1:C3\nL2:C1\tL2:C2\tL2:C3\nL3:C1\tL3:C2\tL3:C3"
    .split("\n")
    .map( *.split( "\t" ).Slip )

Since it splits the result of splitting the first, and then converts it into a Slip to get it slipped into the wider array.

like image 187
jjmerelo Avatar answered Nov 12 '22 09:11

jjmerelo


If you would like your split to give you the individual pieces as one list, rather than a list of lists, you can use the split method's variant that takes a list of delimiters to split by:

say "L1:C1,L1:C2;L1:C3\nL2:C1-L2:C2|L2:C3^L3:C1".split([",", ";", "\n", "|", "^"]).raku;
# output: ("L1:C1", "L1:C2", "L1:C3", "L2:C1-L2:C2", "L2:C3", "L3:C1").Seq

Passing the :k or :v adverbs to the split method call will leave the separator in the result list as separate entries; with :k the value will be the index in the separators list that has the matched separator in it, with :v the separator itself will be in the result list.

like image 27
timotimo Avatar answered Nov 12 '22 10:11

timotimo