Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I achieve sprintf-style formatting for bytes objects in python 3?

I want to do sprintf on python3 but with raw bytes objects, without having to do any manual conversions for the %s to work. So, take a bytes object as a 'template', plus any number of objects of any type and return a rendered bytes object. This is how python 2's sprintf % operator has always worked.

b'test %s %s %s' % (5, b'blah','strblah') # python3 ==> error
Traceback (most recent call last):
  File "<input>", line 1, in <module>
TypeError: %b requires bytes, or an object that implements __bytes__, not 'int'

def to_bytes(arg):
    if hasattr(arg,'encode'): return arg.encode()
    if hasattr(arg,'decode'): return arg
    return repr(arg).encode()

def render_bytes_template(btemplate : bytes, *args):
    return btemplate % tuple(map(to_bytes,args))

render_bytes_template(b'this is how we have to write raw strings with unknown-typed arguments? %s %s %s',5,b'blah','strblah')

# output: b'this is how we have to render raw string templates with unknown-typed arguments? 5 blah strblah'

But in python 2, it's just built in:

'example that just works %s %s %s' % (5,b'blah',u'strblah')
# output: 'example that just works 5 blah strblah'

Is there a way to do this in python 3 but still achieve the same performance of python 2? Please tell me I'm missing something. The fallback here is to implement in cython (or are there libraries out there for python 3 that help in this?) but still not seeing why it was removed from the standard library other than the implicit encoding of the string object. Can't we just add a bytes method like format_any()?

By the way, it's not as simple as this cop-out:

def render_bytes_template(btemplate : bytes, *args):
    return (btemplate.decode() % args).encode()

Not only do I not want to do any unnecessary encode/decoding, but the bytes args are repr'd instead of being injected raw.

like image 721
parity3 Avatar asked Jul 29 '17 04:07

parity3


2 Answers

I want to do sprintf on python3 but with raw bytes objects, without having to do any manual conversions for the %s to work.

For this to work, all the formatting arguments also need to already be bytes.

This has changed since Py2 which allowed even unicode strings to be formatted in a byte string because the Py2 implementation is prone to errors as soon as a unicode string with unicode characters is introduced.

Eg, on Python 2:

In [1]: '%s' % (u'é',)
Out[1]: u'\xe9'

Technically that is correct, but not what the developer intended. It also takes no account of any encoding used.

In Python 3 OTOH:

In [2]: '%s' % ('é',)
Out[2]: 'é'

For formatting byte strings, use byte string arguments (Py3.5+ only)

b'%s %s' % (b'blah', 'strblah'.encode('utf-8'))

Other types like integers need to be converted to byte strings as well.

like image 86
danny Avatar answered Oct 19 '22 14:10

danny


Would something like this work for you? You just need to make sure that when you begin some bytes object you wrap it in the new B bytes-like object which overloads the % and %= operators:

class B(bytes):
    def __init__(self, template):
        self._template = template

    @staticmethod
    def to_bytes(arg):
        if hasattr(arg,'encode'): return arg.encode()
        if hasattr(arg,'decode'): return arg
        return repr(arg).encode()

    def __mod__(self, other):
        if hasattr(other, '__iter__') and not isinstance(other, str):
            ret = self._template % tuple(map(self.to_bytes, other))
        else: 
            ret = self._template % self.to_bytes(other)
        return ret

    def __imod__(self, other):
        return self.__mod__(other)

a = B(b'this %s good')
b = B(b'this %s %s good string')
print(a % 'is')
print(b % ('is', 'a'))

a = B(b'this %s good')
a %= 'is'
b = B(b'this %s %s good string')
b %= ('is', 'a')
print(a)
print(b)

This outputs:

b'this is good'
b'this is a good string'
b'this is good'
b'this is a good string'
like image 1
mattjegan Avatar answered Oct 19 '22 14:10

mattjegan