Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add multiplication signs (*) between coefficients

I have a program in which a user inputs a function, such as sin(x)+1. I'm using ast to try to determine if the string is 'safe' by whitelisting components as shown in this answer. Now I'd like to parse the string to add multiplication (*) signs between coefficients without them.

For example:

  • 3x-> 3*x
  • 4(x+5) -> 4*(x+5)
  • sin(3x)(4) -> sin(3x)*(4) (sin is already in globals, otherwise this would be s*i*n*(3x)*(4)

Are there any efficient algorithms to accomplish this? I'd prefer a pythonic solution (i.e. not complex regexes, not because they're pythonic, but just because I don't understand them as well and want a solution I can understand. Simple regexes are ok. )

I'm very open to using sympy (which looks really easy for this sort of thing) under one condition: safety. Apparently sympy uses eval under the hood. I've got pretty good safety with my current (partial) solution. If anyone has a way to make sympy safer with untrusted input, I'd welcome this too.

like image 258
Luke Taylor Avatar asked Mar 10 '16 23:03

Luke Taylor


People also ask

What symbols are used for multiplication?

Use the multiplication sign (×), not the letter x, to indicate the mathematical operation.

How do you make a multiplication sign in Python?

Multiplication and Division The sign we'll use in Python for multiplication is * and the sign we'll use for division is / .

How do you insert a multiplication sign in Excel?

Multiply numbers in a cell To do this task, use the * (asterisk) arithmetic operator. For example, if you type =5*10 in a cell, the cell displays the result, 50.


1 Answers

A regex is easily the quickest and cleanest way to get the job done in vanilla python, and I'll even explain the regex for you, because regexes are such a powerful tool it's nice to understand.

To accomplish your goal, use the following statement:

import re
# <code goes here, set 'thefunction' variable to be the string you're parsing>
re.sub(r"((?:\d+)|(?:[a-zA-Z]\w*\(\w+\)))((?:[a-zA-Z]\w*)|\()", r"\1*\2", thefunction)

I know it's a bit long and complicated, but a different, simpler solution doesn't make itself immediately obvious without even more hacky stuff than what's gone into the regex here. But, this has been tested against all three of your test cases and works out precisely as you want.

As a brief explanation of what's going on here: The first parameter to re.sub is the regular expression, which matches a certain pattern. The second is the thing we're replacing it with, and the third is the actual string to replace things in. Every time our regex sees a match, it removes it and plugs in the substitution, with some special behind-the-scenes tricks.

A more in-depth analysis of the regex follows:

  • ((?:\d+)|(?:[a-zA-Z]\w*\(\w+\)))((?:[a-zA-Z]\w*)|\() : Matches a number or a function call, followed by a variable or parentheses.
    • ((?:\d+)|(?:[a-zA-Z]\w*\(\w+\))) : Group 1. Note: Parentheses delimit a Group, which is sort of a sub-regex. Capturing groups are indexed for future reference; groups can also be repeated with modifiers (described later). This group matches a number or a function call.
      • (?:\d+) : Non-capturing group. Any group with ?: immediately after the opening parenthesis will not assign an index to itself, but still act as a "section" of the pattern. Ex. A(?:bc)+ will match "Abcbcbcbc..." and so on, but you cannot access the "bcbcbcbc" match with an index. However, without this group, writing "Abc+" would match "Abcccccccc..."
        • \d : Matches any numerical digit once. A regex of \d all its own will match, separately, "1", "2", and "3" of "123".
        • + : Matches the previous element one or more times. In this case, the previous element is \d, any number. In the previous example, \d+ on "123" will successfully match "123" as a single element. This is vital to our regex, to make sure that multi-digit numbers are properly registered.
      • | : Pipe character, and in a regex, it effectively says or: "a|b" will match "a" OR "b". In this case, it separates "a number" and "a function call"; match a number OR a function call.
      • (?:[a-zA-Z]\w*\(\w+\)) : Matches a function call. Also a non-capturing group, like (?:\d+).
        • [a-zA-Z] : Matches the first letter of the function call. There is no modifier on this because we only need to ensure the first character is a letter; A123 is technically a valid function name.
        • \w : Matches any alphanumeric character or an underscore. After the first letter is ensured, the following characters could be letters, numbers, or underscores and still be valid as a function name.
        • * : Matches the previous element 0 or more times. While initially seeming unnecessary, the star character effectively makes an element optional. In this case, our modified element is \w, but a function doesn't technically need any more than one character; A() is a valid function name. A would be matched by [a-zA-Z], making \w unnecessary. On the other end of the spectrum, there could be any number of characters following the first letter, which is why we need this modifier.
        • \( : This is important to understand: this is not another group. The backslash here acts much like an escape character would in a normal string. In a regex, any time you preface a special character, such as parentheses, +, or * with a backslash, it uses it like a normal character. \( matches an opening parenthesis, for the actual function call part of the function.
        • \w+ : Matches a number, letter or underscore one or more times. This ensures the function actually has a parameter going into it.
        • \) : Like \(, but matches a closing parenthesis
    • ((?:[a-zA-Z]\w*)|\() : Group 2. Matches a variable, or an opening parenthesis.
      • (?:[a-zA-Z]\w*) : Matches a variable. This is the exact same as our function name matcher. However, note that this is in a non-capturing group: this is important, because of the way the OR checks. The OR immediately following this looks at this group as a whole. If this was not grouped, the "last object matched" would be \w*, which would not be sufficient for what we want. It would say: "match one letter followed by more letters OR one letter followed by a parenthesis". Putting this element in a non-capturing group allows us to control what the OR registers.
      • | : Or character. Matches (?:[a-zA-Z]\w*) or \(.
      • \( : Matches an opening parenthesis. Once we have checked if there is an opening parenthesis, we don't need to check anything beyond it for the purposes of our regex.

Now, remember our two groups, group one and group two? These are used in the substitution string, "\1*\2". The substitution string is not a true regex, but it still has certain special characters. In this case, \<number> will insert the group of that number. So our substitution string is saying: "Put group 1 in (which is either our function call or our number), then put in an asterisk (*), then put in our second group (either a variable or a parenthesis)"

I think that about sums it up!

like image 108
MutantOctopus Avatar answered Oct 08 '22 01:10

MutantOctopus