Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

hook into the builtin python f-string format machinery

Summary

I really LOVE f-strings. They're bloody awesome syntax.

For a while now I've had an idea for a little library- described below*- to harness them further. A quick example of what I would like it do:

>>> import simpleformatter as sf >>> def format_camel_case(string): ...     """camel cases a sentence""" ...     return ''.join(s.capitalize() for s in string.split()) ... >>> @sf.formattable(camcase=format_camel_case) ... class MyStr(str): ... ... >>> f'{MyStr("lime cordial delicious"):camcase}' 'LimeCordialDelicious' 

It would be immensely useful-- for the purposes of a simplified API, and extending usage to built-in class instances-- to find a way to hook into the builtin python formatting machinery, which would allow the custom format specification of built-ins:

>>> f'{"lime cordial delicious":camcase}' 'LimeCordialDelicious' 

In other words, I'd like to override the built in format function (which is used by the f-string syntax)-- or alternatively, extend the built-in __format__ methods of existing standard library classes-- such that I could write stuff like this:

for x, y, z in complicated_generator:     eat_string(f"x: {x:custom_spec1}, y: {x:custom_spec2}, z: {x:custom_spec3}") 

I have accomplished this by creating subclasses with their own __format__ methods, but of course this will not work for built-in classes.

I could get close to it using the string.Formatter api:

my_formatter=MyFormatter()  # custom string.Formatter instance  format_str = "x: {x:custom_spec1}, y: {x:custom_spec2}, z: {x:custom_spec3}"  for x, y, z in complicated_generator:     eat_string(my_formatter.format(format_str, **locals())) 

I find this to be a tad clunky, and definitely not readable compared to the f-string api.

Another thing that could be done is overriding builtins.format:

>>> import builtins >>> builtins.format = lambda *args, **kwargs: 'womp womp' >>> format(1,"foo") 'womp womp' 

...but this doesn't work for f-strings:

>>> f"{1:foo}" Traceback (most recent call last):   File "<stdin>", line 1, in <module> ValueError: Invalid format specifier 

Details

Currently my API looks something like this (somewhat simplified):

import simpleformatter as sf @sf.formatter("this_specification") def this_formatting_function(some_obj):     return "this formatted someobj!"  @sf.formatter("that_specification") def that_formatting_function(some_obj):     return "that formatted someobj!"  @sf.formattable class SomeClass: ... 

After which you can write code like this:

some_obj = SomeClass() f"{some_obj:this_specification}" f"{some_obj:that_specification}" 

I would like the api to be more like the below:

@sf.formatter("this_specification") def this_formatting_function(some_obj):     return "this formatted someobj!"  @sf.formatter("that_specification") def that_formatting_function(some_obj):     return "that formatted someobj!"  class SomeClass: ...  # no class decorator needed 

...and allow use of custom format specs on built-in classes:

x=1  # built-in type instance f"{x:this_specification}" f"{x:that_specification}" 

But in order to do these things, we have to burrow our way into the built-in format() function. How can I hook into that juicy f-string goodness?

* NOTE: I'll probably never actually get around to implementing this library! But I do think it's a neat idea and invite anyone who wants to, to steal it from me :).

like image 817
Rick supports Monica Avatar asked Apr 27 '19 02:04

Rick supports Monica


People also ask

What is f-string formatting in Python?

Also called “formatted string literals,” f-strings are string literals that have an f at the beginning and curly braces containing expressions that will be replaced with their values.

How do you enter F-string in Python?

Strings in Python are usually enclosed within double quotes ( "" ) or single quotes ( '' ). To create f-strings, you only need to add an f or an F before the opening quotes of your string. For example, "This" is a string whereas f"This" is an f-String.

Are there f-strings in Python?

Python f-string is the newest Python syntax to do string formatting. It is available since Python 3.6. Python f-strings provide a faster, more readable, more concise, and less error prone way of formatting strings in Python. The f-strings have the f prefix and use {} brackets to evaluate values.

What does putting F in front of a string do in Python?

In Python 3.6, the f-string was introduced(PEP 498). In short, it is a way to format your string that is more readable and fast. The f or F in front of strings tell Python to look at the values inside {} and substitute them with the variables values if exists.


1 Answers

Overview

You can, but only if you write evil code that probably should never end up in production software. So let's get started!

I'm not going to integrate it into your library, but I will show you how to hook into the behavior of f-strings. This is roughly how it'll work:

  1. Write a function that manipulates the bytecode instructions of code objects to replace FORMAT_VALUE instructions with calls to a hook function;
  2. Customize the import mechanism to make sure that the bytecode of every module and package (except standard library modules and site-packages) is modified with that function.

You can get the full source at https://github.com/mivdnber/formathack, but everything is explained below.

Disclaimer

This solution isn't great, because

  1. There's no guarantee at all that this won't break totally unrelated code;
  2. There's no guarantee that the bytecode manipulations described here will continue working in newer Python versions. It definitely won't work in alternative Python implementations that don't compile to CPython compatible bytecode. PyPy could work in theory, but the solution described here doesn't because the bytecode package isn't 100% compatible.

However, it is a solution, and bytecode manipulation has been used succesfully in popular packages like PonyORM. Just keep in mind that it's hacky, complicated and probably maintenance heavy.

Part 1: Bytecode manipulation

Python code is not executed directly, but is first compiled to a simpler intermediairy, non-human readable stack based language called Python bytecode (it's what's inside *.pyc files). To get an idea of what that bytecode looks like, you can use the standard library dis module to inspect the bytecode of a simple function:

def invalid_format(x):     return f"{x:foo}" 

Calling this function will cause an exception, but we'll "fix" that soon.

>>> invalid_format("bar") Traceback (most recent call last):   File "<stdin>", line 1, in <module>   File "<stdin>", line 2, in invalid_format ValueError: Invalid format specifier 

To inspect the bytecode, fire up a Python console and call dis.dis:

>>> import dis >>> dis.dis(invalid_format)   2           0 LOAD_FAST                0 (x)               2 LOAD_CONST               1 ('foo')               4 FORMAT_VALUE             4 (with format)               6 RETURN_VALUE 

I've annotated the output below to explain what's happening:

# line 2      # Put the value of function parameter x on the stack   2           0 LOAD_FAST                0 (x)               # Put the format spec on the stack as a string               2 LOAD_CONST               1 ('foo')               # Pop both values from the stack and perform the actual formatting               # This puts the formatted string on the stack               4 FORMAT_VALUE             4 (with format)               # pop the result from the stack and return it               6 RETURN_VALUE 

The idea here is to replace the FORMAT_VALUE instruction with a call to a hook function that allows us to implement whatever behavior we want. Let's implement it like this for now:

def formathack_hook__(value, format_spec=None):     """     Gets called whenever a value is formatted. Right now it's a silly implementation,     but it can be expanded with all sorts of nasty hacks.     """     return f"{value} formatted with {format_spec}" 

To replace the instruction, I used the bytecode package, which provides surprisingly nice abstractions for doing horrible things.

from bytecode import Bytecode def formathack_rewrite_bytecode__(code):     """     Modifies a code object to override the behavior of the FORMAT_VALUE     instructions used by f-strings.     """     decompiled = Bytecode.from_code(code)     modified_instructions = []     for instruction in decompiled:         name = getattr(instruction, 'name', None)         if name == 'FORMAT_VALUE':             # 0x04 means that a format spec is present             if instruction.arg & 0x04 == 0x04:                 callback_arg_count = 2             else:                 callback_arg_count = 1             modified_instructions.extend([                 # Load in the callback                 Instr("LOAD_GLOBAL", "formathack_hook__"),                 # Shuffle around the top of the stack to put the arguments on top                 # of the function global                 Instr("ROT_THREE" if callback_arg_count == 2 else "ROT_TWO"),                 # Call the callback function instead of executing FORMAT_VALUE                 Instr("CALL_FUNCTION", callback_arg_count)             ])         # Kind of nasty: we want to recursively alter the code of functions.         elif name == 'LOAD_CONST' and isinstance(instruction.arg, types.CodeType):             modified_instructions.extend([                 Instr("LOAD_CONST", formathack_rewrite_bytecode__(instruction.arg), lineno=instruction.lineno)             ])         else:             modified_instructions.append(instruction)     modified_bytecode = Bytecode(modified_instructions)     # For functions, copy over argument definitions     modified_bytecode.argnames = decompiled.argnames     modified_bytecode.argcount = decompiled.argcount     modified_bytecode.name = decompiled.name     return modified_bytecode.to_code() 

We can now make the invalid_format function we defined earlier work:

>>> invalid_format.__code__ = formathack_rewrite_bytecode__(invalid_format.__code__) >>> invalid_format("bar") 'bar formatted with foo' 

Success! Manually cursing code objects with tainted bytecode in itself won't damn our souls to an eternity of suffering though; for that, we should manipulate all code automatically.

Part 2: Hooking into the import process

To make the new f-string behavior work everywhere, and not just in manually patched functions, we can customize the Python module import process with a custom module finder and loader using the functionality provided by the standard library importlib module:

class _FormatHackLoader(importlib.machinery.SourceFileLoader):     """     A module loader that modifies the code of the modules it imports to override     the behavior of f-strings. Nasty stuff.     """     @classmethod     def find_spec(cls, name, path, target=None):         # Start out with a spec from a default finder         spec = importlib.machinery.PathFinder.find_spec(             fullname=name,              # Only apply to modules and packages in the current directory              # This prevents standard library modules or site-packages              # from being patched.             path=[""],             target=target         )         if spec is None:             return None                  # Modify the loader in the spec to this loader         spec.loader = cls(name, spec.origin)         return spec      def get_code(self, fullname):         # This is called by exec_module to get the code of the module         # to execute it.         code = super().get_code(fullname)         # Rewrite the code to modify the f-string formatting opcodes         rewritten_code = formathack_rewrite_bytecode__(code)         return rewritten_code      def exec_module(self, module):         # We introduce the callback that hooks into the f-string formatting         # process in every imported module         module.__dict__["formathack_hook__"] = formathack_hook__         return super().exec_module(module) 

To make sure the Python interpreter uses this loader to import all files, we have to add it to sys.meta_path:

def install():     # If the _FormatHackLoader is not registered as a finder,     # do it now!     if sys.meta_path[0] is not _FormatHackLoader:         sys.meta_path.insert(0, _FormatHackLoader)         # Tricky part: we want to be able to use our custom f-string behavior         # in the main module where install was called. That module was loaded         # with a standard loader though, so that's impossible without additional         # dirty hacks.         # Here, we execute the module _again_, this time with _FormatHackLoader         module_globals = inspect.currentframe().f_back.f_globals         module_name = module_globals["__name__"]         module_file = module_globals["__file__"]         loader = _FormatHackLoader(module_name, module_file)         loader.load_module(module_name)         # This is actually pretty important. If we don't exit here, the main module         # will continue from the formathack.install method, causing it to run twice!         sys.exit(0) 

If we put it all together in a formathack module (see https://github.com/mivdnber/formathack for an integrated, working example), we can now use it like this:

# In your main Python module, install formathack ASAP import formathack formathack.install()  # From now on, f-string behavior will be overridden!  print(f"{foo:bar}") # -> "foo formatted with bar" 

So that's that! You can expand on this to make the hook function more intelligent and useful (e.g. by registering functions that handle certain format specifiers).

like image 54
Michilus Avatar answered Sep 25 '22 09:09

Michilus