Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add new statements to Python without customizing the compiler

I'd like to add a new keyword to Python and @EliBendersky's wonderful answer explains how to do this by changing the code and re-distributing the Python compiler.

Is it possible to introduce a new keyword without changing the compiler code? Perhaps introduce it through a library?

Edit:

For example, I'd like to add a shorthand for regex matching by adding a keyword like matches that can be used like:

"You can't take the sky from me" matches '.+sky.+'

I can add new, custom behavior using AST transformations, but the above case will fail on a syntax error.

like image 359
noamt Avatar asked Jan 30 '18 14:01

noamt


2 Answers

One cannot introduce a new keyword without changing the language

The parser is the tool/program that reads through the code, and decides what makes sense and what doesn't. Although it's a rather coarse definition, the consequence is that the language is defined by its parser.

The parser relies on the language's (formal) grammar, specified in the ast module documentation.

While defining a mere function only introduces a new feature without modifying the language, adding a keyword is tantamount to introducing a new syntax, which in turn changes the language's grammar.

Therefore, adding a new keyword, in the sense of adding a new syntax to a language, cannot be made without changing the grammar's language, which requires editing the compilation and execution chain.

However...

There might be some smart ways to introduce a new feature, that looks like a new syntax but in fact only uses the existing syntax. For instance, the goto module relies on a not-so-well-known property of the language, that the spaces around a dot in a qualified identifier are ignored.

You can try this by yourself:

>>> l = [1, 2, 3]
>>> l    .append(4)
>>> l
[1, 2, 3, 4]
>>> l.    append(5)
>>> l
[1, 2, 3, 4, 5]

This allows using the following, that looks like a new syntax, but really is not:

label .myLabel
goto .myLabel

Now, the goto module uses the way the interpreter internally works to perform break from one goto to a given label... But that's another problem.


I'd like to add that Python is quite an open-minded language. It provides a nice amount of seldom used operators, for instance, @. This operator, introduced from Python 3.5, was primarily meant for matrix multiplication, and falls back to a call to __matmul__. I have to say, I've never seen it in code. So, why not use it for your purpose?

Let's do it step-by-step. I propose to define a r class, that will behave as a regex.

import re

class r:
    def __init__(self, pattern):
        self.regex = re.compile(pattern)

Now, I want to be able to use the @ operator with this class, together with a string, with the semantic of a match between the string and the pattern. I'll define the __matmul__ method, just as follows:

class r:
    def __matmul__(self, string):
        return bool(self.regex.match(string))

Now, I can do the following:

>>> r("hello") @ "hello"
True
>>> r("hello"] @ "world"
False

Pretty nice, but not that yet. I'll define the __rmatmul__ method as well, so it merely falls back to a call to __matmul__. In the end, the r class looks like this:

class r:
    def __init__(self, pattern):
        self.regex = re.compile(pattern)

    def __matmul__(self, string):
        return bool(self.regex.match(string))

    def __rmatmul__(self, string):
        return self @ string

Now, the reverse operation works as well:

>>> "hello" @ r("hello")
True
>>> "123456" @ r("\d+")
True
>>> "abc def" @ r("\S+$")
False

This is very near from what you were attempting, except, I didn't have to introduce a new keyword! Of course, now the r identifier must be protected, just like str or list...

like image 184
Right leg Avatar answered Oct 17 '22 23:10

Right leg


For your particular "problem" (shorten the way to match a regex), a solution would be to create a subclass of str and use an unused binary operator (ex: minus, maybe a better choice could be done, unfortunately we cannot use ~ as it's unary)

example:

import re

class MyStr(str):
    def __sub__(self,other):
        return re.match(other,self)

a = MyStr("You can't take the sky from me")
print(a - '.+sky.+')
print(a - '.+xxx.+')

result:

<_sre.SRE_Match object; span=(0, 30), match="You can't take the sky from me">
None

So "subbing" the regex from your string object returns the match object.

The disavantage is that now you have to write string literals wrapped in the new object (not possible to define this new operator into str itself)

like image 1
Jean-François Fabre Avatar answered Oct 18 '22 00:10

Jean-François Fabre