How to put appropriate line breaks in a string representing a mathematical expression that is 9000+ characters?

Question

I have a many long strings (9000+ characters each) that represent mathematical expressions. I originally generate the expressions using sympy, a python symbolic algebra package. A truncated example is:

a = 'm[i]**2*(zb[layer]*m[i]**4 - 2*zb[layer]*m[j]**2*m[i]**2 + zb[layer]*m[j]**4 - zt[layer]*m[i]**4 + 2*zt[layer]*m[j]**2*m[i]**2 - zt[layer]*m[j]**4)**(-1)*ab[layer]*sin(m[i]*zb[layer])*sin(m[j]*zb[layer])'

I end up copying the text in the string and then using is as code (i.e. copy the text between ' and ' and then paste it into a function as code):

def foo(args):
    return m[i]**2*(zb[layer]*m[i]**4 - 2*zb[layer]*m[j]**2*m[i]**2 + zb[layer]*m[j]**4 - zt[layer]*m[i]**4 + 2*zt[layer]*m[j]**2*m[i]**2 - zt[layer]*m[j]**4)**(-1)*ab[layer]*sin(m[i]*zb[layer])*sin(m[j]*zb[layer])

The long lines of code become unwieldy and slow down my IDE (Spyder) so I want to put some linebreaks in (the code works fine as one long line). I have successfully done this manually by enclosing the expression in brackets and putting in linebreaks myself (i.e. use implicit line contnuation as per PEP8):

def foo(args):
    return (m[i]**2*(zb[layer]*m[i]**4 - 2*zb[layer]*m[j]**2*m[i]**2 + 
        zb[layer]*m[j]**4 - zt[layer]*m[i]**4 + 2*zt[layer]*m[j]**2*m[i]**2 - 
        zt[layer]*m[j]**4)**(-1)*ab[layer]*sin(m[i]*zb[layer])*sin(m[j]*zb[layer]))

I'd like some function or functionality that will put in the linebreaks for me. I've tried using the textwrap module but that splits the line an inappropriate places. For example in the code below the last line splits in the middle of 'layer' which invalidates my mathematical expression:

>>> import textwrap
>>> a = 'm[i]**2*(zb[layer]*m[i]**4 - 2*zb[layer]*m[j]**2*m[i]**2 + zb[layer]*m[j]**4 - zt[layer]*m[i]**4 + 2*zt[layer]*m[j]**2*m[i]**2 - zt[layer]*m[j]**4)**(-1)*ab[layer]*sin(m[i]*zb[layer])*sin(m[j]*zb[layer])'
>>> print(textwrap.fill(a,width=70))
m[i]**2*(zb[layer]*m[i]**4 - 2*zb[layer]*m[j]**2*m[i]**2 +
zb[layer]*m[j]**4 - zt[layer]*m[i]**4 + 2*zt[layer]*m[j]**2*m[i]**2 - 
zt[layer]*m[j]**4)**(-1)*ab[layer]*sin(m[i]*zb[layer])*sin(m[j]*zb[lay
er])

My rules of thumb for manually splitting the string and still having a valid expression when I paste the string as code are:

enclose whole expression in ().
split at approximately 70 characters wide after white-space or a +, -, *, ], ).

abarnert · Accepted Answer

First, just passing break_long_words=False will prevent it from splitting label in the middle.

But that isn't enough to fix your problem. The output will be valid, but it may exceed 70 columns. In your example, it will:

m[i]**2*(zb[layer]*m[i]**4 - 2*zb[layer]*m[j]**2*m[i]**2 +
zb[layer]*m[j]**4 - zt[layer]*m[i]**4 + 2*zt[layer]*m[j]**2*m[i]**2 -
zt[layer]*m[j]**4)**(-1)*ab[layer]*sin(m[i]*zb[layer])*sin(m[j]*zb[layer])

Fortunately, while textwrap can't do everything in the world, it also makes good sample code. That's why the docs link straight to the source.

What you want is essentially the break_on_hyphens, but breaking on arithmetic operators as well. So, if you just change the regexp to use (-|\+|\*\*|\*) in wordsep_re, that may be all it takes. Or it may take a bit more work, but it should be easy to figure out from there.

Here's an example:

class AlgebraWrapper(textwrap.TextWrapper):
    wordsep_re = re.compile(r'(\s+|(?:-|\+|\*\*|\*|\)|\]))')
w = AlgebraWrapper(break_long_words=False, break_on_hyphens=True)
print w.fill(a)

This will give you:

m[i]**2*(zb[layer]*m[i]**4 - 2*zb[layer]*m[j]**2*m[i]**2 + zb[layer]*
m[j]**4 - zt[layer]*m[i]**4 + 2*zt[layer]*m[j]**2*m[i]**2 - zt[layer]*
m[j]**4)**(-1)*ab[layer]*sin(m[i]*zb[layer])*sin(m[j]*zb[layer])

But really, you just got lucky that it didn't need to break on brackets or parens, because as simple as I've written it, it will break before a bracket just as easily as after one, which will be syntactically valid, but very ugly. The same thing is true for operators, but it's far less ugly to break before a * than a ]. So, I'd probably split on just actual operators, and leave it at that:

wordsep_re = re.compile(r'(\s+|(?:-|\+|\*\*|\*))')

If that's not acceptable, then you'll have to come up with the regexp you actually want and drop it in place of wordsep_re.

An alternative solution is to decorate-wrap-undecorate. For example:

b = re.sub(r'(-|\+|\*\*|\*', r'\1 ', a)
c = textwrap.fill(b)
d = re.sub(r'(-|\+|\*\*|\*) ', r'\1', c)

Of course this isn't perfect—it won't prefer existing spaces over added spaces, and it will fill to less than 70 columns (because it will be counting those added spaces toward the limit). But if you're just looking for something quick&dirty, it may serve, and if not, it may at least be a starting point to what you actually need.

Either way, the easiest way to enclose the whole thing in parens is to do that up-front:

if len(a) >= 70:
    a = '({})'.format(a)

How to put appropriate line breaks in a string representing a mathematical expression that is 9000+ characters?

Tags:

python

string

sympy

rtrwalker

1 Answers

abarnert

Recent Activity

Donate For Us

How to put appropriate line breaks in a string representing a mathematical expression that is 9000+ characters?

Tags:

python

string

sympy

rtrwalker

1 Answers

abarnert

Related questions

Recent Activity

Donate For Us