I stumbled upon a "strange" behaviour when I tried to parse a multiline string (with tab separated strings) with the intent to find all values separated by tabs, using two splits in a row:
use v6.d; # 2020.01 release
my $s = "L1:C1\tL1:C2\tL1:C3\nL2:C1\tL2:C2\tL2:C3\nL3:C1\tL3:C2\tL3:C3";
say $s.split(/\n/).split(/\t/).raku;
and the corresponding printout is as follows:
("L1:C1", "L1:C2", "L1:C3 L2:C1", "L2:C2", "L2:C3 L3:C1", "L3:C2", "L3:C3").Seq
The "strange" behaviour is in the 3d and 5th member of the resulting sequence. It seems that there is an overlap of the "expected" last string of a line and the first string of the subsequent line.
My expectation was something like :
("L1:C1", "L1:C2", "L1:C3", "L2:C1", "L2:C2", "L2:C3", "L3:C1", "L3:C2", "L3:C3").Seq
Anybody to give a detailed explanation to the innerworkings of this behaviour?
Just to clarify things, I know that the correct code is:
$s.split(/\n/)>>.split(/\t/).flat.raku
but my question is about the innerworkings of the "wrong" code. How did Raku come to that result?
You are splitting the result of the first split, which is a list; the split method will coerce whatever it's called on to a string and then splits it. A list will stringify (via its Str
method) to its members separated by single spaces. That is the reason why some of the resulting fields have two L and C pairs and a space in between.
This will get you the result you want:
say "L1:C1\tL1:C2\tL1:C3\nL2:C1\tL2:C2\tL2:C3\nL3:C1\tL3:C2\tL3:C3"
.split("\n")
.map( *.split( "\t" ).Slip )
Since it splits the result of splitting the first, and then converts it into a Slip to get it slipped into the wider array.
If you would like your split to give you the individual pieces as one list, rather than a list of lists, you can use the split method's variant that takes a list of delimiters to split by:
say "L1:C1,L1:C2;L1:C3\nL2:C1-L2:C2|L2:C3^L3:C1".split([",", ";", "\n", "|", "^"]).raku;
# output: ("L1:C1", "L1:C2", "L1:C3", "L2:C1-L2:C2", "L2:C3", "L3:C1").Seq
Passing the :k
or :v
adverbs to the split method call will leave the separator in the result list as separate entries; with :k
the value will be the index in the separators list that has the matched separator in it, with :v
the separator itself will be in the result list.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With