Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I translate this Perl5/PCRE to Perl 6 regex?

Tags:

regex

pcre

raku

Just to get this out of the way, I would use index, substr or similar, as they are the obvious solution for my specific case but I'm making a grammar and so I can only use regex. :(

That being said, advice on translating Perl5/PCRE regex to Perl6 regex is good SO content anyways, because Perl 6 is gaining popularity, and its regex engine is very different.


Here's a regex to only match a string which doesn't contain any of a given list of characters.
(try it here.)

^(?:(?!\/).)*$
^            # assert position at start of string
(?:          # begin a noncapturing group 
   (?!       # negative lookahead: following regex must not match the string
      \/     # literal forward slash
    )        # end negative lookahead
    .        # any character, once
 )*          # the previous noncapturing group, 0..Inf times
 $           # assert position at end of string

Obviously, doesn't work in Perl 6 for a number of reasons.

For the reason stated above, I'd like to use this in Perl 6. Here's what I've tried to translate it to, based on CTRL-F ing the perl6 regex docs for non capturing and negative lookahead:

[ \/ <!before .*> \/ <!after .*> || .? ]*

And the breakdown (I think?):

[       # begin a noncapturing group which apparently look like a charclass in p6
\/      # a literal forward slash  
<!before .*> # negative lookahead for the immediately preceding regex (literal /)
\/      # a literal /
<!after .*>  # negative lookbehind for the immediately preceding regex
|| .?   # force this to be a noncapturing group, not a charclass
]*      # end noncapturing group and allow it to match 0..Inf times

I implement this like my regex not-in { ... } and then use it like /^<not-in>$/. However, it returns Nil for every string, which means it isn't working properly.

I haven't been able to find the equivalent of http://regex101.com for Perl 6, so playing around with it isn't as easy as it would be with Perl 5.

How do I translate this to Perl 6?

like image 877
cat Avatar asked Jan 27 '16 19:01

cat


People also ask

What does %s mean in Perl?

Substitution Operator or 's' operator in Perl is used to substitute a text of the string with some pattern specified by the user.

What is PCRE format?

The PCRE library is a set of functions that implement regular expression pattern matching using the same syntax and semantics as Perl 5. PCRE has its own native API, as well as a set of wrapper functions that correspond to the POSIX regular expression API.

What is PCRE matching?

PCRE tries to match Perl syntax and semantics as closely as it can. PCRE also supports some alternative regular expression syntax (which does not conflict with the Perl syntax) in order to provide some compatibility with regular expressions in Python, . NET, and Oniguruma.


3 Answers

Short answer

Regex for matching only strings that lack forward slashes: /^ <-[ / ]>* $/

/ start of regular expression
^ beginning of string

<-[ open negative character class (without the -, this would be a normal character class)
/ characters that the class will not match
]> close character class

* zero or more "copies" of this class
$ end of string
/ end of regular expression

Spaces in Perl 6 regexes are ignored by default.


Full answer

If I understand correctly, you're just trying to match a string that does not contain a forward slash. In that case, just use a negative character class.

A character class containing a and b would be written thus: <[ab]>

A character class containing anything besides a or b would be written thus: <-[ab]>

A character class containing anything besides / would be written thus: <-[ / ]> and the regex for making sure that no character in a string contained a forward slash would be /^ <-[ / ]>* $/.

This code matches when a string lacks a forward slash and doesn't match when it contains a forward slash:

say "Match" if "abc/" ~~ /^ <-[ / ]>* $/; # Doesn't match
say "Match" if "abcd" ~~ /^ <-[ / ]>* $/; # Matches

The preferred way for just checking for the exclusion of one character is to use the index function. However, if you want to exclude more than one character, just use the negative character class with all of the characters you don't want to find in the string.

like image 166
Christopher Bottoms Avatar answered Oct 26 '22 13:10

Christopher Bottoms


The literal translation of your original regex ^(?:(?!\/).)*$ to the Perl 6 syntax is:

^ [ <!before \/> . ]* $

It's simple enough for a direct translation.

  • Replace (?:...) with [...]
  • Replace (?!...) with <!before...>
  • Assume the x modifier by default

Everything else stays the same in this example.

I've tested it with a simple:

say "Match" if "ab/c" ~~ /^ [ <!before \/> . ]* $/; # doesn't match
say "Match" if "abc"  ~~ /^ [ <!before \/> . ]* $/; # Match
like image 26
Lucas Trzesniewski Avatar answered Oct 26 '22 15:10

Lucas Trzesniewski


Just to get this out of the way

Your question starts with:

Just to get this out of the way, I would use index, substr or similar, as they are the obvious solution for my specific case but I'm making a grammar and so I can only use regex. :(

Being pedantic, you can do this. In fact you can embed arbitrary code in Perl regexes.


A typical Perl 6 example:

/ (\d**1..3) <?{ $/ < 256 }> / # match an octet

The \d**1..3 bit matches 1 to 3 decimal digits. The (...) parens surrounding that bit tell Perl 6 to store the match in the special variable $/.

The <?{ ... }> bit is a code assertion. If the code returns true the regex continues. If not, it backtracks or fails.


Using index etc. (in this case I've picked substr-eq) inside a regex is cumbersome and probably insane. But it's doable:

say "a/c" ~~ / a <?{ $/.orig.substr-eq: '/', $/.to }> . c /;
say "abc" ~~ / a <?{ $/.orig.substr-eq: '/', $/.to }> . c /

displays:

「a/c」
Nil

(Calling .orig on a Match object returns the original string that was, or is being, matched against. Calling .to returns the index within that original string that's as far as the match got to, or has gotten to so far; "abc" ~~ / a { say $/.orig, $/.to } bc / displays abc1.)

like image 45
raiph Avatar answered Oct 26 '22 15:10

raiph