When I use a triple-quoted multiline string in Python, I tend to use textwrap.dedent to keep the code readable, with good indentation:
some_string = textwrap.dedent("""
First line
Second line
...
""").strip()
However, in Python 3.x, textwrap.dedent doesn't seem to work with byte strings. I encountered this while writing a unit test for a method that returned a long multiline byte string, for example:
# The function to be tested
def some_function():
return b'Lorem ipsum dolor sit amet\n consectetuer adipiscing elit'
# Unit test
import unittest
import textwrap
class SomeTest(unittest.TestCase):
def test_some_function(self):
self.assertEqual(some_function(), textwrap.dedent(b"""
Lorem ipsum dolor sit amet
consectetuer adipiscing elit
""").strip())
if __name__ == '__main__':
unittest.main()
In Python 2.7.10 the above code works fine, but in Python 3.4.3 it fails:
E
======================================================================
ERROR: test_some_function (__main__.SomeTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test.py", line 16, in test_some_function
""").strip())
File "/usr/lib64/python3.4/textwrap.py", line 416, in dedent
text = _whitespace_only_re.sub('', text)
TypeError: can't use a string pattern on a bytes-like object
----------------------------------------------------------------------
Ran 1 test in 0.001s
FAILED (errors=1)
So: Is there an alternative to textwrap.dedent that works with byte strings?
wrap(text, width=70, **kwargs): This function wraps the input paragraph such that each line in the paragraph is at most width characters long. The wrap method returns a list of output lines. The returned list is empty if the wrapped output has no content. Default width is taken as 70.
textwrap. dedent(text) Remove any common leading whitespace from every line in text. This can be used to make triple-quoted strings line up with the left edge of the display, while still presenting them in the source code in indented form.
Wrap text around a picture or drawing objectSelect the picture or object. Select Format and then under Arrange, select Wrap Text. Choose the wrapping option that you want to apply.
Answer 2: textwrap
is primarily about the Textwrap
class and functions. dedent
is listed under
# -- Loosely related functionality --------------------
As near as I can tell, the only things that makes it text (unicode str
) specific are the re literals. I prefixed all 6 with b
and voila! (I did not edit anything else, but the function docstring should be adjusted.)
import re
_whitespace_only_re = re.compile(b'^[ \t]+$', re.MULTILINE)
_leading_whitespace_re = re.compile(b'(^[ \t]*)(?:[^ \t\n])', re.MULTILINE)
def dedent_bytes(text):
"""Remove any common leading whitespace from every line in `text`.
This can be used to make triple-quoted strings line up with the left
edge of the display, while still presenting them in the source code
in indented form.
Note that tabs and spaces are both treated as whitespace, but they
are not equal: the lines " hello" and "\\thello" are
considered to have no common leading whitespace. (This behaviour is
new in Python 2.5; older versions of this module incorrectly
expanded tabs before searching for common leading whitespace.)
"""
# Look for the longest leading string of spaces and tabs common to
# all lines.
margin = None
text = _whitespace_only_re.sub(b'', text)
indents = _leading_whitespace_re.findall(text)
for indent in indents:
if margin is None:
margin = indent
# Current line more deeply indented than previous winner:
# no change (previous winner is still on top).
elif indent.startswith(margin):
pass
# Current line consistent with and no deeper than previous winner:
# it's the new winner.
elif margin.startswith(indent):
margin = indent
# Find the largest common whitespace between current line
# and previous winner.
else:
for i, (x, y) in enumerate(zip(margin, indent)):
if x != y:
margin = margin[:i]
break
else:
margin = margin[:len(indent)]
# sanity check (testing/debugging only)
if 0 and margin:
for line in text.split(b"\n"):
assert not line or line.startswith(margin), \
"line = %r, margin = %r" % (line, margin)
if margin:
text = re.sub(rb'(?m)^' + margin, b'', text)
return text
print(dedent_bytes(b"""
Lorem ipsum dolor sit amet
consectetuer adipiscing elit
""")
)
# prints
b'\nLorem ipsum dolor sit amet\n consectetuer adipiscing elit\n'
It seems like dedent
does not support bytestrings, sadly. However, if you want cross-compatible code, I recommend you take advantage of the six
library:
import sys, unittest
from textwrap import dedent
import six
def some_function():
return b'Lorem ipsum dolor sit amet\n consectetuer adipiscing elit'
class SomeTest(unittest.TestCase):
def test_some_function(self):
actual = some_function()
expected = six.b(dedent("""
Lorem ipsum dolor sit amet
consectetuer adipiscing elit
""")).strip()
self.assertEqual(actual, expected)
if __name__ == '__main__':
unittest.main()
This is similar to your bullet point suggestion in the question
I could convert to unicode, use textwrap.dedent, and convert back to bytes. But this is only viable if the byte string conforms to some Unicode encoding.
But you're misunderstanding something about encodings here - if you can write the string literal in your test like that in the first place, and have the file successfully parsed by python (i.e. the correct coding declaration is on the module), then there is no "convert to unicode" step here. The file gets parsed in the encoding specified (or sys.defaultencoding
, if you didn't specify) and then when the string is a python variable it is already decoded.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With