Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does multiple use of `<( )>` token within `comb` not behave as expected?

I want to extract the row key(here is 28_2820201112122420516_000000), the column name(here is bcp_startSoc), and the value(here is 64.0) in $str, where $str is a row from HBase:

# `match` is OK
my $str = '28_2820201112122420516_000000 column=d:bcp_startSoc, timestamp=1605155065124, value=64.0';
my $match = $str.match(/^ ([\d+]+ % '_') \s 'column=d:' (\w+) ',' \s timestamp '=' \d+ ',' \s 'value=' (<-[=]>+) $/);
my @match-result = $match».Str.Slip;
say @match-result;   # Output: [28_2820201112122420516_000000 bcp_startSoc 64.0]

# `smartmatch` is OK
# $str ~~ /^ ([\d+]+ % '_') \s 'column=d:' (\w+) ',' \s timestamp '=' \d+ ',' \s 'value=' (<-[=]>+) $/
# say $/».Str.Array; # Output: [28_2820201112122420516_000000 bcp_startSoc 64.0]

# `comb` is NOT OK
# A <( token indicates the start of the match's overall capture, while the corresponding )> token indicates its endpoint. 
# The <( is similar to other languages \K to discard any matches found before the \K.
my @comb-result = $str.comb(/<( [\d+]+ % '_' )> \s 'column=d:' <(\w+)> ',' \s timestamp '=' \d+ ',' \s 'value=' <(<-[=]>+)>/);
say @comb-result;    # Expect: [28_2820201112122420516_000000 bcp_startSoc 64.0], but got [64.0]

I want comb to skip some matches, and just match what i wanted, so i use multiple <( and )> here, but only get the last match as result.

Is it possible to use comb to get the same result as match method?

like image 998
chenyf Avatar asked Nov 19 '20 09:11

chenyf


2 Answers

TL;DR Multiple <(...)>s don't mean multiple captures. Even if they did, .comb reduces each match to a single string in the list of strings it returns. If you really want to use .comb, one way is to go back to your original regex but also store the desired data using additional code inside the regex.

Multiple <(...)>s don't mean multiple captures

The default start point for the overall match of a regex is the start of the regex. The default end point is the end.

Writing <( resets the start point for the overall match to the position you insert it at. Each time you insert one and it gets applied during processing of a regex it resets the start point. Likewise )> resets the end point. At the end of processing a regex the final settings for the start and end are applied in constructing the final overall match.

Given that your code just unconditionally resets each point three times, the last start and end resets "win".

.comb reduces each match to a single string

foo.comb(/.../) is equivalent to foo.match(:g, /.../)>>.Str;.

That means you only get one string for each match against the regex.

One possible solution is to use the approach @ohmycloudy shows in their answer.

But that comes with the caveats raised by myself and @jubilatious1 in comments on their answer.

Add { @comb-result .push: |$/».Str } to the regex

You can workaround .comb's normal functioning. I'm not saying it's a good thing to do. Nor am I saying it's not. You asked, I'm answering, and that's it. :)

Start with your original regex that worked with your other solutions.

Then add { @comb-result .push: |$/».Str } to the end of the regex to store the result of each match. Now you will get the result you want.

like image 115
raiph Avatar answered Oct 19 '22 11:10

raiph


$str.comb( /  ^ [\d+]+ % '_' | <?after d\:> \w+  | <?after value\=> .*/ )
like image 31
ohmycloudy Avatar answered Oct 19 '22 12:10

ohmycloudy