Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Possible to make custom string literal prefixes in Python?

Let's say I have a custom class derived from str that implements/overrides some methods:

class mystr(str):
    # just an example for a custom method:
    def something(self):
        return "anything"

Now currently I have to manually create instances of mystr by passing it a string in the constructor:

ms1 = mystr("my string")

s = "another string"
ms2 = mystr(s)

This is not too bad, but it lead to the idea that it would be cool to use a custom string prefix similar to b'bytes string' or r'raw string' or u'unicode string'.

Is it somehow possible in Python to create/register such a custom string literal prefix like m so that a literal m'my string' results in a new instance of mystr?
Or are those prefixes hard-coded into the Python interpreter?

like image 379
Byte Commander Avatar asked May 13 '16 07:05

Byte Commander


People also ask

Can string literal be changed?

The behavior is undefined if a program attempts to modify any portion of a string literal. Modifying a string literal frequently results in an access violation because string literals are typically stored in read-only memory.

How many ways can you create string literals in Python?

Python string literals Python strings can be created with single quotes, double quotes, or triple quotes. When we use triple quotes, strings can span several lines without using the escape character. In our example we assign three string literals to a , b , and c variables.

How many ways can you write a literal string in a program?

Char literals For char data types, we can specify literals in 4 ways: Single quote: We can specify literal to a char data type as a single character within the single quote. char ch = 'a';

How do you define a string literal in Python?

String literals in python are surrounded by either single quotation marks, or double quotation marks. 'hello' is the same as "hello".

What does the prefix u do in a Python string?

String prefixes are part of syntax, not data. In other words, they don't create a different type of string, but create a string in a different way. u does nothing in Python 3. It only exists for compatibility with Python 2.

How to change the prefix of a literal?

Literals are literals; the literal itself possesses or lacks a prefix, you can't retroactively change it. @dsfgh: Trying to write such a prefix selecting function is difficult to the point where, given the limited utility of such a function and the inability to cover all cases, the "benefits" are not worth the trouble of writing.

How to create string literals in Python?

Python has different types of literals. A string literal can be created by writing a text (a group of Characters ) surrounded by the single (”), double (“”), or triple quotes. By using triple quotes we can write multi-line strings or display in the desired way. Here geekforgeeks is string literal which is assigned to variable (s).

What does the R prefix mean in Python?

The r prefix simply changes the escaping rules, it doesn’t create a different type. There is no regular expression literal such as you find in Perl, Ruby, or JavaScript. Python has two string types: byte string, and Unicode string. Byte strings are sequences of bytes, and Unicode strings are sequences of Unicode codepoints.


2 Answers

Those prefixes are hardcoded in the interpreter, you can't register more prefixes.


What you could do however, is preprocess your Python files, by using a custom source codec. This is a rather neat hack, one that requires you to register a custom codec, and to understand and apply source code transformations.

Python allows you to specify the encoding of source code with a special comment at the top:

# coding: utf-8

would tell Python that the source code encoded with UTF-8, and will decode the file accordingly before parsing. Python looks up the codec for this in the codecs module registry. And you can register your own codecs.

The pyxl project uses this trick to parse out HTML syntax from Python files to replace them with actual Python syntax to build that HTML, all in a 'decoding' step. See the codec package in that project, where the register module registers a custom codec search function that transforms source code before Python actually parses and compiles it. A custom .pth file is installed into your site-packages directory to load this registration step at Python startup time. Another project that does the same thing to parse out Ruby-style string formatting is interpy.

All you have to do then, is build such a codec too that'll parse a Python source file (tokenizes it, perhaps with the tokenize module) and replaces string literals with your custom prefix with mystr(<string literal>) calls. Any file you want parsed you mark with # coding: yourcustomcodec.

I'll leave that part as an exercise for the reader. Good luck!

Note that the result of this transformation is then compiled into bytecode, which is cached; your transformation only has to run once per source code revision, all other imports of a module using your codec will load the cached bytecode.

like image 149
Martijn Pieters Avatar answered Sep 17 '22 17:09

Martijn Pieters


One could use operator overloading to implicitly convert str into a custom class

class MyString(str):
    def __or__( self, a ):
        return MyString(self + a)

m = MyString('')
print( m, type(m) )
#('', <class 'MyString'>)
print m|'a', type(m|'a')
#('a', <class 'MyString'>)

This avoids the use of parenthesis effectively emulating a string prefix with one extra character ─ which I chose to be | but it could also be & or other binary comparison operator.

like image 20
PhMota Avatar answered Sep 21 '22 17:09

PhMota