I have a many long strings (9000+ characters each) that represent mathematical expressions. I originally generate the expressions using sympy, a python symbolic algebra package. A truncated example is:
a = 'm[i]**2*(zb[layer]*m[i]**4 - 2*zb[layer]*m[j]**2*m[i]**2 + zb[layer]*m[j]**4 - zt[layer]*m[i]**4 + 2*zt[layer]*m[j]**2*m[i]**2 - zt[layer]*m[j]**4)**(-1)*ab[layer]*sin(m[i]*zb[layer])*sin(m[j]*zb[layer])'
I end up copying the text in the string and then using is as code (i.e. copy the text between ' and ' and then paste it into a function as code):
def foo(args):
return m[i]**2*(zb[layer]*m[i]**4 - 2*zb[layer]*m[j]**2*m[i]**2 + zb[layer]*m[j]**4 - zt[layer]*m[i]**4 + 2*zt[layer]*m[j]**2*m[i]**2 - zt[layer]*m[j]**4)**(-1)*ab[layer]*sin(m[i]*zb[layer])*sin(m[j]*zb[layer])
The long lines of code become unwieldy and slow down my IDE (Spyder) so I want to put some linebreaks in (the code works fine as one long line). I have successfully done this manually by enclosing the expression in brackets and putting in linebreaks myself (i.e. use implicit line contnuation as per PEP8):
def foo(args):
return (m[i]**2*(zb[layer]*m[i]**4 - 2*zb[layer]*m[j]**2*m[i]**2 +
zb[layer]*m[j]**4 - zt[layer]*m[i]**4 + 2*zt[layer]*m[j]**2*m[i]**2 -
zt[layer]*m[j]**4)**(-1)*ab[layer]*sin(m[i]*zb[layer])*sin(m[j]*zb[layer]))
I'd like some function or functionality that will put in the linebreaks for me. I've tried using the textwrap
module but that splits the line an inappropriate places. For example in the code below the last line splits in the middle of 'layer' which invalidates my mathematical expression:
>>> import textwrap
>>> a = 'm[i]**2*(zb[layer]*m[i]**4 - 2*zb[layer]*m[j]**2*m[i]**2 + zb[layer]*m[j]**4 - zt[layer]*m[i]**4 + 2*zt[layer]*m[j]**2*m[i]**2 - zt[layer]*m[j]**4)**(-1)*ab[layer]*sin(m[i]*zb[layer])*sin(m[j]*zb[layer])'
>>> print(textwrap.fill(a,width=70))
m[i]**2*(zb[layer]*m[i]**4 - 2*zb[layer]*m[j]**2*m[i]**2 +
zb[layer]*m[j]**4 - zt[layer]*m[i]**4 + 2*zt[layer]*m[j]**2*m[i]**2 -
zt[layer]*m[j]**4)**(-1)*ab[layer]*sin(m[i]*zb[layer])*sin(m[j]*zb[lay
er])
My rules of thumb for manually splitting the string and still having a valid expression when I paste the string as code are:
()
.+
, -
, *
, ]
, )
.First, just passing break_long_words=False
will prevent it from splitting label
in the middle.
But that isn't enough to fix your problem. The output will be valid, but it may exceed 70 columns. In your example, it will:
m[i]**2*(zb[layer]*m[i]**4 - 2*zb[layer]*m[j]**2*m[i]**2 +
zb[layer]*m[j]**4 - zt[layer]*m[i]**4 + 2*zt[layer]*m[j]**2*m[i]**2 -
zt[layer]*m[j]**4)**(-1)*ab[layer]*sin(m[i]*zb[layer])*sin(m[j]*zb[layer])
Fortunately, while textwrap
can't do everything in the world, it also makes good sample code. That's why the docs link straight to the source.
What you want is essentially the break_on_hyphens
, but breaking on arithmetic operators as well. So, if you just change the regexp to use (-|\+|\*\*|\*)
in wordsep_re
, that may be all it takes. Or it may take a bit more work, but it should be easy to figure out from there.
Here's an example:
class AlgebraWrapper(textwrap.TextWrapper):
wordsep_re = re.compile(r'(\s+|(?:-|\+|\*\*|\*|\)|\]))')
w = AlgebraWrapper(break_long_words=False, break_on_hyphens=True)
print w.fill(a)
This will give you:
m[i]**2*(zb[layer]*m[i]**4 - 2*zb[layer]*m[j]**2*m[i]**2 + zb[layer]*
m[j]**4 - zt[layer]*m[i]**4 + 2*zt[layer]*m[j]**2*m[i]**2 - zt[layer]*
m[j]**4)**(-1)*ab[layer]*sin(m[i]*zb[layer])*sin(m[j]*zb[layer])
But really, you just got lucky that it didn't need to break on brackets or parens, because as simple as I've written it, it will break before a bracket just as easily as after one, which will be syntactically valid, but very ugly. The same thing is true for operators, but it's far less ugly to break before a *
than a ]
. So, I'd probably split on just actual operators, and leave it at that:
wordsep_re = re.compile(r'(\s+|(?:-|\+|\*\*|\*))')
If that's not acceptable, then you'll have to come up with the regexp you actually want and drop it in place of wordsep_re
.
An alternative solution is to decorate-wrap-undecorate. For example:
b = re.sub(r'(-|\+|\*\*|\*', r'\1 ', a)
c = textwrap.fill(b)
d = re.sub(r'(-|\+|\*\*|\*) ', r'\1', c)
Of course this isn't perfect—it won't prefer existing spaces over added spaces, and it will fill to less than 70 columns (because it will be counting those added spaces toward the limit). But if you're just looking for something quick&dirty, it may serve, and if not, it may at least be a starting point to what you actually need.
Either way, the easiest way to enclose the whole thing in parens is to do that up-front:
if len(a) >= 70:
a = '({})'.format(a)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With