Add multiplication signs (*) between coefficients

Tags:

I have a program in which a user inputs a function, such as sin(x)+1. I'm using ast to try to determine if the string is 'safe' by whitelisting components as shown in this answer. Now I'd like to parse the string to add multiplication (*) signs between coefficients without them.

For example:

3x-> 3*x
4(x+5) -> 4*(x+5)
sin(3x)(4) -> sin(3x)*(4) (sin is already in globals, otherwise this would be s*i*n*(3x)*(4)

Are there any efficient algorithms to accomplish this? I'd prefer a pythonic solution (i.e. not complex regexes, not because they're pythonic, but just because I don't understand them as well and want a solution I can understand. Simple regexes are ok. )

I'm very open to using sympy (which looks really easy for this sort of thing) under one condition: safety. Apparently sympy uses eval under the hood. I've got pretty good safety with my current (partial) solution. If anyone has a way to make sympy safer with untrusted input, I'd welcome this too.

258

asked Mar 10 '16 23:03

Luke Taylor

1 Answers

A regex is easily the quickest and cleanest way to get the job done in vanilla python, and I'll even explain the regex for you, because regexes are such a powerful tool it's nice to understand.

To accomplish your goal, use the following statement:

import re
# <code goes here, set 'thefunction' variable to be the string you're parsing>
re.sub(r"((?:\d+)|(?:[a-zA-Z]\w*\(\w+\)))((?:[a-zA-Z]\w*)|\()", r"\1*\2", thefunction)

I know it's a bit long and complicated, but a different, simpler solution doesn't make itself immediately obvious without even more hacky stuff than what's gone into the regex here. But, this has been tested against all three of your test cases and works out precisely as you want.

As a brief explanation of what's going on here: The first parameter to re.sub is the regular expression, which matches a certain pattern. The second is the thing we're replacing it with, and the third is the actual string to replace things in. Every time our regex sees a match, it removes it and plugs in the substitution, with some special behind-the-scenes tricks.

A more in-depth analysis of the regex follows:

((?:\d+)|(?:[a-zA-Z]\w*\(\w+\)))((?:[a-zA-Z]\w*)|\() : Matches a number or a function call, followed by a variable or parentheses.
- ((?:\d+)|(?:[a-zA-Z]\w*\(\w+\))) : Group 1. Note: Parentheses delimit a Group, which is sort of a sub-regex. Capturing groups are indexed for future reference; groups can also be repeated with modifiers (described later). This group matches a number or a function call.
  - (?:\d+) : Non-capturing group. Any group with ?: immediately after the opening parenthesis will not assign an index to itself, but still act as a "section" of the pattern. Ex. A(?:bc)+ will match "Abcbcbcbc..." and so on, but you cannot access the "bcbcbcbc" match with an index. However, without this group, writing "Abc+" would match "Abcccccccc..."
    - \d : Matches any numerical digit once. A regex of \d all its own will match, separately, "1", "2", and "3" of "123".
    - + : Matches the previous element one or more times. In this case, the previous element is \d, any number. In the previous example, \d+ on "123" will successfully match "123" as a single element. This is vital to our regex, to make sure that multi-digit numbers are properly registered.
  - | : Pipe character, and in a regex, it effectively says or: "a|b" will match "a" OR "b". In this case, it separates "a number" and "a function call"; match a number OR a function call.
  - (?:[a-zA-Z]\w*\(\w+\)) : Matches a function call. Also a non-capturing group, like (?:\d+).
    - [a-zA-Z] : Matches the first letter of the function call. There is no modifier on this because we only need to ensure the first character is a letter; A123 is technically a valid function name.
    - \w : Matches any alphanumeric character or an underscore. After the first letter is ensured, the following characters could be letters, numbers, or underscores and still be valid as a function name.
    - * : Matches the previous element 0 or more times. While initially seeming unnecessary, the star character effectively makes an element optional. In this case, our modified element is \w, but a function doesn't technically need any more than one character; A() is a valid function name. A would be matched by [a-zA-Z], making \w unnecessary. On the other end of the spectrum, there could be any number of characters following the first letter, which is why we need this modifier.
    - \( : This is important to understand: this is not another group. The backslash here acts much like an escape character would in a normal string. In a regex, any time you preface a special character, such as parentheses, +, or * with a backslash, it uses it like a normal character. \( matches an opening parenthesis, for the actual function call part of the function.
    - \w+ : Matches a number, letter or underscore one or more times. This ensures the function actually has a parameter going into it.
    - \) : Like \(, but matches a closing parenthesis
- ((?:[a-zA-Z]\w*)|\() : Group 2. Matches a variable, or an opening parenthesis.
  - (?:[a-zA-Z]\w*) : Matches a variable. This is the exact same as our function name matcher. However, note that this is in a non-capturing group: this is important, because of the way the OR checks. The OR immediately following this looks at this group as a whole. If this was not grouped, the "last object matched" would be \w*, which would not be sufficient for what we want. It would say: "match one letter followed by more letters OR one letter followed by a parenthesis". Putting this element in a non-capturing group allows us to control what the OR registers.
  - | : Or character. Matches (?:[a-zA-Z]\w*) or \(.
  - \( : Matches an opening parenthesis. Once we have checked if there is an opening parenthesis, we don't need to check anything beyond it for the purposes of our regex.

Now, remember our two groups, group one and group two? These are used in the substitution string, "\1*\2". The substitution string is not a true regex, but it still has certain special characters. In this case, \<number> will insert the group of that number. So our substitution string is saying: "Put group 1 in (which is either our function call or our number), then put in an asterisk (*), then put in our second group (either a variable or a parenthesis)"

I think that about sums it up!

108

answered Oct 08 '22 01:10

MutantOctopus

Related questions
                            
                                Internal Redirect in Flask
                            
                                Macro Recursion in Jinja2
                            
                                listcomp unable to access locals defined in code called by exec if nested in function
                            
                                PUT Request to REST API using Python
                            
                                Python, PEP-8, E122 continuation line missing indentation or outdented
                            
                                how do I make a 2.7 python context manager threadsafe
                            
                                Equivalent of Python's Counter in Ruby
                            
                                Python socket.error: [Errno 104] Connection reset by peer
                            
                                How to debug when using multiprocessing in pycharm
                            
                                Making a meaningful sentence from a given set of words [closed]
                            
                                Why is sorting a python list of tuples faster when I explicitly provide the key as the first element?
                            
                                SQLAlchemy: Using a CTE from a (sub)query w/ FROM clause specified as literal text
                            
                                Dependency rule tried to blank-out primary key column in SQL-Alchemy when trying to delete record
                            
                                Load directly gz file into pandas dataframe
                            
                                Weighted linear regression with Scikit-learn
                            
                                When is KeyboardInterrupt raised in Python?
                            
                                Paho MQTT client connection reliability (reconnect on disconnection)
                            
                                Python 3 Unit Testing - Assert Logger NOT called
                            
                                What is inter_byte_timeout (interCharTimeout) in pyserial?
                            
                                Destructor in metaclass Singleton object

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Add multiplication signs (*) between coefficients

Tags:

python

function

string

sympy

Luke Taylor

People also ask

1 Answers

MutantOctopus

Recent Activity

Donate For Us