Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

If regexes are methods, which class they correspond to?

Tags:

oop

raku

Regexes are actually Methods:

say rx/foo/.^mro # ((Regex) (Method) (Routine) (Block) (Code) (Any) (Mu))

In that case, it means that they can act on self and are part of a class. What would that class be? My hunch is that it's the Match class and that they are actually acting on $/ (which they actually are). Any other way of formulating this?

like image 686
jjmerelo Avatar asked Jun 27 '19 18:06

jjmerelo


3 Answers

Ultimately, all regexes expect to receive an invocant of type Match or some subclass of Match. In Perl 6, an invocant is simply the first argument, and is not special in any other way.

Those regexes declared with rule, token or regex within a package will be installed as methods on that package. Most typically, they are declared in a grammar, which is nothing more than a class whose default parent is Grammar rather than Any. Grammar is a sub-type of Match.

grammar G {}.^mro.say # ((G) (Grammar) (Match) (Capture) (Cool) (Any) (Mu))

It's thus quite natural to see these as just methods, but with a body written in a different language. In fact, that's precisely what they are.

It's a little harder to see how the anonymous regexes are methods, in so far as they don't get installed in the method table of any type. However, if we were to write:

class C {
    method foo() { 42 }
}
my $m = anon method () { self.foo }
say C.$m()

Then we see that we can resolve symbols on the invocant through self, even though this method is not actually installed on the class C. It's the same with anonymous regexes. The reason this matters is that assertions like <ident>, <.ws>, <?before foo> and friends are actually compiled into method calls.

Thus, anonymous regexes being methods, and thus treating their first argument as an invocant, is what allows the various builtin rules, which are declared on Match, to be resolved.

like image 93
Jonathan Worthington Avatar answered Oct 22 '22 06:10

Jonathan Worthington


A method does not have to correspond with any class:

my method bar () { say self, '!' }

bar 'Hello World'; # Hello World!


my regex baz { :ignorecase 'hello world' }

'Hello World' ~~ /<baz>/;
'Hello World' ~~ &baz;
&baz.ACCEPTS('Hello World'); # same as previous line

# baz 'Hello World';

By default methods, and by extension regexes have a has relationship with whatever class they are declared inside of.

class Foo {
        method bar () { say self, '!' }
  # has method bar () { say self, '!' }

        regex  baz    { :ignorecase 'hello world' }
  # has regex  baz () { :ignorecase 'hello world' }
}

A regex does need some requirements fulfilled by whatever it's invocant is.

By just running it as a subroutine, it tells you the first one:

my regex baz { :ignorecase 'hello world' }

baz 'Hello World';
No such method '!cursor_start' for invocant of type 'Str'
  in regex baz at <unknown file> line 1
  in block <unit> at <unknown file> line 1

Usually a regex is declared inside of a class declared with grammar.

grammar Foo {
}

say Foo.^mro;
# ((Foo) (Grammar) (Match) (Capture) (Cool) (Any) (Mu))

So the requirements are likely fulfilled by Grammar, Match, or Capture in this case.

It could also be from a role that gets composed with it.

say Foo.^roles.map(*.^name);
# (NQPMatchRole)

There is even more reason to believe that it is Match or Capture

my regex baz {
    ^
    { say 'baz was called on: ', self.^name }
}
&baz.ACCEPTS(''); # baz was called on: Match
my regex baz ( $s ) {
    :ignorecase
    "$s"
}
baz Match.new(orig => 'Hello World'), 'hello';
# 「Hello」

I see no reason someone couldn't do that themselves in a normal class though.


Note that $/ is just a variable. So saying it is passed to a regex is a misunderstanding of the situation.

my regex baz ( $/ ) {
    :ignorecase
    "$/"
}
'Hello World' ~~ /<baz('hello')>/;
# 「Hello」
#  baz => 「Hello」

It would be more accurate to say that when calling a regex from inside of another one, the current $/ is used as the invocant to the method/regex.
(I'm not entirely sure this is actually what happens.)

So the previous example would then be sort-of like this:

'Hello World' ~~ /{ $/.&baz('hello') }/;
like image 20
Brad Gilbert Avatar answered Oct 22 '22 04:10

Brad Gilbert


This explanation combines what I think Brad++ and Jonathan++ just taught me, with what I thought I already knew, with what I discovered as I dug further.

(My original goal was to directly explain Brad's mysterious No such method '!cursor_start' message. I've failed for now, and have instead just filed a bug report, but here's what else I ended up with.)

Methods

Methods are designed to work naturally in classes. Indeed a method declaration without a scope declarator assumes has -- and a has declaration belongs inside a class:

method bar {} # Useless declaration of a has-scoped method in mainline

But in fact methods also work fine as either:

  • subs (i.e. not behaving as an object oriented method at all); or

  • methods for prototype-based programming (i.e. object orientation, but without classes).

What really makes methods methods is that they are routines with an "invocant". An invocant is a special status first parameter that:

  • Is implicitly inserted into the method's signature if not explicitly declared. If a method is declared inside a class, then the type constraint is that class, otherwise it's Mu:
        class foo { my method bar {} .signature .say } # (foo: *%_)
                    my method bar {} .signature .say   # (Mu: *%_)
  • Is a required positional. Thus:
        my method bar {}
        bar # Too few positionals passed; expected 1 argument but got 0
  • Is always aliased to self. Thus:
        my method bar { say self }
        bar 42 # 42
  • Is occasionally explicitly declared by specifying it as the first parameter in a signature and following it with a colon (:). Thus:
        my method bar (Int \baz:) { say baz } 
        say &bar.signature; # (Int \baz: *%_)
        bar 42;             # 42
        bar 'string';       # Type check failed in binding to parameter 'baz'

Regexes

Focusing just on the invocant perspective, regexes are methods that take/expect a match object as their invocant.

A regex is typically called in three somewhat different scenarios:

  • By direct use. For example my regex foo { . }; say 'a' ~~ &foo; # 「a」 (or just say 'a' ~~ / . /; # 「a」, but I'll only cover the essentially identical named example to simplify my explanation). This translates to say &foo.ACCEPTS: 'a'. This in turn is implemented by this code in Rakudo. As you can see, this calls the regex foo with the invocant Match.'!cursor_init'(...) -- which runs this code without :build. The upshot is that foo gets a new Match object as its invocant.

  • By way of the Grammar class's .parse method. The .parse method creates a new instance of the grammar and then calls the top "rule" (rule/token/regex/method) on that new grammar object. Note that a Grammar is a sub-class of Match; so, just as with the first scenario, the rule/regex is being passed an as-yet-empty match object. If the top rule matches, the new grammar/match object will be returned by the call to .parse. (Otherwise it'll return Nil.)

  • By way of one of the above. The top rule in a grammar will typically contain calls to lower level rules/tokens/regexes/methods. Likewise, a free standing rule/regex may contain calls to other rules/regexes. Each such call will involve creating another new grammar/match object that becomes the invocant for the nested call. If the nested call matches, and it's a capturing call, then the new grammar/match object is added to the higher level grammar/match object.

like image 3
raiph Avatar answered Oct 22 '22 06:10

raiph