Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why aren't // and m// exactly synonymous?

Tags:

regex

raku

From the examples below, I see that / / and m/ / aren't exactly synonymous, contrary to what I expected. I thought that the only reason to use m/ / instead of / / was that it allows using different delimiters (e.g. m{ }). Why are they different and why would I want to use one versus the other?

I am searching for CSV files in a directory. At first I searched for files ending in csv, thus (all code shown as seen from the Perl 6 REPL):

> my @csv_files = dir( test => / csv $ /  );
["SampleSheet.csv".IO]

but recently a file ending in Csv showed up. So I tried matching case insensitively:

> my @csv_files = dir( test => m:i/ csv $ / );
Use of uninitialized value of type Any in string context.
Methods .^name, .perl, .gist, or .say can be used to stringify it to something meaningful.
  in block <unit> at <unknown file> line 1

I found that I could fix this by putting a block around the matching expression:

> my @csv_files = dir( test => { m:i/ csv $ / } );
["SampleSheet.csv".IO]

However, if I had used a block around the original expression it doesn't match with the bare / /, but it does with m/ /:

> my @csv_files = dir( test => { / csv $ / } );
[]
> my @csv_files = dir( test => { m/ csv $ / } );
["SampleSheet.csv".IO]

Then I found out that if I used the case-insensitive adverb inside / /, it does work:

> my @csv_files = dir( test => /:i csv $ / );
["SampleSheet.csv".IO]

Anyway, / / and m/ / are clearly behaving differently and it's not yet clear to me why.

like image 654
Christopher Bottoms Avatar asked Jul 26 '17 12:07

Christopher Bottoms


1 Answers

The difference between /.../ and m/.../

From Regexes#Lexical conventions:

m/abc/;         # a regex that is immediately matched against $_ 
rx/abc/;        # a Regex object 
/abc/;          # a Regex object

In other words, it's /.../ and rx/.../ that are synonyms, not /.../ and m/.../:

  • /.../ and rx/.../ return the specified regex as a Regex object, without matching it against anything for now.
  • m/.../ immediately matches the specified regex against the string that's stored in the variable$_ (the so-called "topic"), and returns the result as a Match object, or as Nil if there was no match.

Demonstration:

$_ = "Foo 123";

say m/\d+/;        # 「123」
say m/\d+/.^name;  # Match

say /\d+/;         # /\d+/
say /\d+/.^name;   # Regex

Explanations & comments regarding your code

Applying regex modifiers

but recently a file ending in Csv showed up. So I tried matching case insensitively

 my @csv_files = dir( test => m:i/ csv $ / );
 Use of uninitialized value of type Any in string context.
 Methods .^name, .perl, .gist, or .say can be used to stringify it to something meaningful.
   in block <unit> at <unknown file> line 1

That code immediately matches the regex against the topic $_ of the calling scope, which is uninitialized. This involves converting it to a string (which causes the warning Use of uninitialized value of type Any in string context), and returns Nil because there is no match. So you're essentially calling the function as dir( test => Nil ).

To make it work, either use rx or apply the :i adverb inside the regex:

my @csv_files = dir( test => rx:i/ csv $ / );
my @csv_files = dir( test => / :i csv $ / );

Blocks as smart-matchers

I found that I could fix this by putting a block around the matching expression:

> my @csv_files = dir( test => { m:i/ csv $ / } );

That works too. What happens here, is:

  • { ... } creates a block that takes a single argument (which is available as $_ inside the block).
  • The m:i/ ... / inside the block matches against $_, and returns a Match.
  • Because the m:i/.../ is the last statement in the block, its Match becomes the return value of the block.
  • The test adverb of the dir function accepts any smart-matcher, which includes not just Regex objects but also Block objects (see the documentation for the smart-match operator ~~).

Using a Regex as a Bool

However, if I had used a block around the original expression it doesn't match with the bare / /, but it does with m/ /:

> my @csv_files = dir( test => { / csv $ / } );
[]

When a block is used as a smart-matcher, it is first called and then its return value is coerced to a Bool: True means it matched, and False means it didn't.

In this case, your block always returs a Regex object.

Coercing a regex object to a boolean, immediately matches it against the current $_, and returns True if the regex matched, and `False if it didn't:

say /\d+/.Bool;  # False

$_ = "123";
say /\d+/.Bool;  # True

So in your code, the regex ends up being repeatedly checked against $_, rather than against the filenames:

$_ = "abc";
.say for dir test => { / \d+ / }  # Returns no filenames

$_ = "abc 123";
.say for dir test => { / \d+ / }  # Returns all filenames

Filtering files by their extension

I am searching for CSV files in a directory. At first I searched for files ending in csv, thus (all code shown as seen from the Perl 6 REPL):

> my @csv_files = dir( test => / csv $ /  );

This doesn't just find files that have the CSV extension, but all files that end in the three letters cvs, including ones like foobarcsv or foobar.xcsv.
Here are two better ways to write it if you only want CSV files:

my @csv-files = dir test => / ".csv" $ /;
my @csv-files = dir.grep: *.extension eq "csv"

Or the case-insensitive version:

my @csv-files = dir test => / :i ".csv" $ /;
my @csv-files = dir.grep: *.extension.lc eq "csv"
like image 77
smls Avatar answered Nov 06 '22 04:11

smls