Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

trans function hang forever when work with a single `^` or `$`

Tags:

raku

When trans method work with regex such as /^/, it hangs and can't return any more!

for (-9, -6 ... 0, 2 , 4 ... 10).rotor( 2 => -1) {
    .join(',').trans(/^/ => '[', /$/ => ')' ).say;
}

I expetcet it to print out the following:

[-9,-6)
[-6,-3)
[-3,0)
[0,2)
[2,4)
[4,6)
[6,8)
[8,10)

But it just get sucks and seems won't return any more. As the Raku doc says, "Replaces one or many characters with one or many characters". It seems that trans must consume at least one character:

> '123abc345'.trans( /<?after 34> 5$/ => '-')
123abc34-
> '123abc345'.trans( /<?after 34> 5/ => '-')
123abc34-
> '123abc345'.trans( /<?after 345> $/ => '-')
123abc345

> '123abc345'.trans( /^ \d+ <( \w+ )> $/ => '-')
123-
like image 449
chenyf Avatar asked Aug 26 '19 15:08

chenyf


1 Answers

I agree it must consume a character.

From a doc I've written on trans:

If one of the matchers on the left hand side is a null string or regex, and no other matchers match at a given position in the input string then .trans goes into an infinite loop.

Also from that doc:

[this doc] may be a step toward updating the official doc and/or cleaning up the relevant spec tests and/or functionality.

While I found several problems with trans that likely involve bugs, the above is the only thing that I somewhat definitively nailed down in the doc. (I've put my mention of this issue in a strange place. Indeed the doc organization is a bit odd. I think I got tired of working on trans at the time and intended to return on it again to make further progress and forgot about it till you just asked your question.)

Anyhow, I just searched rakudo issues on GH and rt and there's no matching bug in either queue.

I think it warrants a bug report, at least for the scenario in which trans isn't using any regexes at all and just has a null string matcher that causes an infinite loop. (The scenario of a zero length matching regex causing a loop is perhaps a separate bug that ought be fixed by modifying regex engine code. I'm not sure about that.)

Either way, if you want to file an issue then please link to this SO so that folk also get exposure to my trans doc.

Exploring a variation of your example

Let's start with the regex /^/ -- match the start of a string -- using a non trans construct to confirm it does the right thing:

my $foo = 'x';
say $foo ~~ s/^/end/; # 「」
say $foo;             # endx

So /^/ matches; it's a zero length match/capture; the s/// construct inserts the replacement string. All seems well.


The trans behavior is much more complex. Here's a heavily "instrumented" example whose matching patterns are close to your example and are also part of a trans:

sub start-regex ($which,$/) {
  say "start regex $which, ++count = {++$count}, .pos = {$/.pos}"
}
sub end-regex ($which,$/) {
  say "end regex $which, .pos = {$/.pos}, matched = '$/' \n"
}
sub replace ($which,$/) {
  say "regex $which replaces $/ at .pos = $/.pos()"; $which
}

my $foo = 'x';
my $count;

say $foo.trans:
  / { start-regex 1, $/ } ^                     { end-regex 1, $/ } /
     => { replace 1, $/ },
  / { start-regex 2, $/ } . <?{ $count > 0 }> $ { end-regex 2, $/ } /
     => { replace 2, $/ }

This displays:

start regex 1, ++count = 1, .pos = 0
end regex 1, .pos = 0, matched = '' 

start regex 2, ++count = 2, .pos = 0
end regex 2, .pos = 1, matched = 'x' 

regex 2 replaces x at .pos = 1
start regex 2, ++count = 3, .pos = 0
end regex 2, .pos = 1, matched = 'x' 

start regex 1, ++count = 4, .pos = 1
start regex 2, ++count = 5, .pos = 1
2

Here's what it seems to do:

  • Call and match the 1st regex. The match is zero length.

  • Call and match the 2nd regex. The match is one character.

  • Decide the 2nd regex is longer so it wins. So call the replacement.

  • Reset the position to zero and call the 2nd regex a second time! I've no idea why. It matches again but the replacement is not called a second time.

  • Finally, with the position now advanced by one, it tries both regexes again and they both fail to match.


If the condition in the 2nd regex is changed from $count > 0 to $count > 2 things go very differently. It enters an infinite loop that starts like this:

start regex 1, ++count = 1, .pos = 0
end regex 1, .pos = 0, matched = '' 

start regex 2, ++count = 2, .pos = 0
start regex 2, ++count = 3, .pos = 1
regex 1 replaces  at .pos = 0
start regex 1, ++count = 4, .pos = 0
end regex 1, .pos = 0, matched = '' 

start regex 1, ++count = 5, .pos = 0
end regex 1, .pos = 0, matched = '' 

regex 1 replaces  at .pos = 0
start regex 1, ++count = 6, .pos = 0

Here's what it seems to do:

  • Call and match the 1st regex. The match is zero length.

  • Call the 2nd regex. The condition fails so it doesn't end it.

  • Reset .pos to 1 (!?) and call the 2nd regex a second time! I've no idea why. It fails again.

  • Call the replacement closure corresponding to the 1st regex. Why? I thought the trans logic was to not accept a regex if it was zero length!

  • The position is not advanced and then the 1st regex is matched again. Twice! And no attempt to match the 2nd regex. And then the replacement closure corresponding to the 1st regex replacement is called again.

  • And now we're stuck in a loop, repeating the last bullet point!

Very strange behavior...

like image 79
raiph Avatar answered Oct 21 '22 02:10

raiph