Let's say I have a custom class derived from <code>str</code> that implements/overrides some methods: <pre class="prettyprint"><code>class mystr(str): # just an example for a custom method: def something(self): return "anything" </code></pre> Now currently I have to manually create instances of <code>mystr</code> by passing it a string in the constructor: <pre class="prettyprint"><code>ms1 = mystr("my string") s = "another string" ms2 = mystr(s) </code></pre> This is not too bad, but it lead to the idea that it would be cool to use a custom string prefix similar to <code>b'bytes string'</code> or <code>r'raw string'</code> or <code>u'unicode string'</code>. Is it somehow possible in Python to create/register such a custom string literal prefix like <code>m</code> so that a literal <code>m'my string'</code> results in a new instance of <code>mystr</code>? Or are those prefixes hard-coded into the Python interpreter?

Those prefixes are hardcoded in the interpreter, you can't register more prefixes. <hr> What you could do however, is preprocess your Python files, by using a custom source codec. This is a rather neat hack, one that requires you to register a custom codec, and to understand and apply source code transformations. Python allows you to specify the encoding of source code with a special comment at the top: <pre class="prettyprint"><code># coding: utf-8 </code></pre> would tell Python that the source code encoded with UTF-8, and will decode the file accordingly before parsing. Python looks up the codec for this in the <code>codecs</code> module registry. And you can register your own codecs. The pyxl project uses this trick to parse out HTML syntax from Python files to replace them with actual Python syntax to build that HTML, all in a 'decoding' step. See the <code>codec</code> package in that project, where the <code>register</code> module registers a custom <code>codec</code> search function that transforms source code before Python actually parses and compiles it. A custom <code>.pth</code> file is installed into your <code>site-packages</code> directory to load this registration step at Python startup time. Another project that does the same thing to parse out Ruby-style string formatting is <code>interpy</code>. All you have to do then, is build such a codec too that'll parse a Python source file (tokenizes it, perhaps with the <code>tokenize</code> module) and replaces string literals with your custom prefix with <code>mystr(<string literal>)</code> calls. Any file you want parsed you mark with <code># coding: yourcustomcodec</code>. I'll leave that part as an exercise for the reader. Good luck! Note that the result of this transformation is then compiled into bytecode, which is cached; your transformation only has to run once per source code revision, all other imports of a module using your codec will load the cached bytecode.

One could use operator overloading to implicitly convert <code>str</code> into a custom class <pre class="prettyprint"><code>class MyString(str): def __or__( self, a ): return MyString(self + a) m = MyString('') print( m, type(m) ) #('', <class 'MyString'>) print m|'a', type(m|'a') #('a', <class 'MyString'>) </code></pre> This avoids the use of parenthesis effectively emulating a string prefix with one extra character ─ which I chose to be <code>|</code> but it could also be <code>&</code> or other binary comparison operator.

Possible to make custom string literal prefixes in Python?

Tags:

python

string

string-literals

prefix

Let's say I have a custom class derived from str that implements/overrides some methods:

class mystr(str):
    # just an example for a custom method:
    def something(self):
        return "anything"

Now currently I have to manually create instances of mystr by passing it a string in the constructor:

ms1 = mystr("my string")

s = "another string"
ms2 = mystr(s)

This is not too bad, but it lead to the idea that it would be cool to use a custom string prefix similar to b'bytes string' or r'raw string' or u'unicode string'.

Is it somehow possible in Python to create/register such a custom string literal prefix like m so that a literal m'my string' results in a new instance of mystr?
Or are those prefixes hard-coded into the Python interpreter?

379

asked May 13 '16 07:05

Byte Commander

2 Answers

Those prefixes are hardcoded in the interpreter, you can't register more prefixes.

What you could do however, is preprocess your Python files, by using a custom source codec. This is a rather neat hack, one that requires you to register a custom codec, and to understand and apply source code transformations.

Python allows you to specify the encoding of source code with a special comment at the top:

# coding: utf-8

would tell Python that the source code encoded with UTF-8, and will decode the file accordingly before parsing. Python looks up the codec for this in the codecs module registry. And you can register your own codecs.

The pyxl project uses this trick to parse out HTML syntax from Python files to replace them with actual Python syntax to build that HTML, all in a 'decoding' step. See the codec package in that project, where the register module registers a custom codec search function that transforms source code before Python actually parses and compiles it. A custom .pth file is installed into your site-packages directory to load this registration step at Python startup time. Another project that does the same thing to parse out Ruby-style string formatting is interpy.

All you have to do then, is build such a codec too that'll parse a Python source file (tokenizes it, perhaps with the tokenize module) and replaces string literals with your custom prefix with mystr(<string literal>) calls. Any file you want parsed you mark with # coding: yourcustomcodec.

I'll leave that part as an exercise for the reader. Good luck!

Note that the result of this transformation is then compiled into bytecode, which is cached; your transformation only has to run once per source code revision, all other imports of a module using your codec will load the cached bytecode.

149

answered Sep 17 '22 17:09

Martijn Pieters

One could use operator overloading to implicitly convert str into a custom class

class MyString(str):
    def __or__( self, a ):
        return MyString(self + a)

m = MyString('')
print( m, type(m) )
#('', <class 'MyString'>)
print m|'a', type(m|'a')
#('a', <class 'MyString'>)

This avoids the use of parenthesis effectively emulating a string prefix with one extra character ─ which I chose to be | but it could also be & or other binary comparison operator.

answered Sep 21 '22 17:09

PhMota

Related questions
                            
                                Store datetimes in HDF5 with H5Py
                            
                                Cannot import tkinter after installing Python 3 with pyenv
                            
                                Time-series boxplot in pandas
                            
                                Matplotlib vline label parameter not showing
                            
                                Concatenate 'str' and 'NoneType' objects [duplicate]
                            
                                Receiving gzip with Flask
                            
                                Use Line2D to plot line in python
                            
                                Read specific bytes of file in python
                            
                                How to combine hash codes in in Python3?
                            
                                How to use the user_passes_test decorator in class based views?
                            
                                pymongo sorting by date
                            
                                Seaborn tsplot does not show datetimes on x axis well
                            
                                How to show date and time on x axis in matplotlib
                            
                                Mean over multiple axis in NumPy
                            
                                Theano config directly in script
                            
                                Best way to force values to uppercase in sqlalchemy field
                            
                                Can't use Jupyter Notebook: jsonschema apparently missing
                            
                                List on python appending always the same value [duplicate]
                            
                                Declare a static variable in an enum class
                            
                                Graph-tool surprisingly slow compared to Networkx

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With