Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I split a string of a mathematical expressions in python?

I made a program which convert infix to postfix in python. The problem is when I introduce the arguments. If i introduce something like this: (this will be a string)

( ( 73 + ( ( 34 - 72 ) / ( 33 - 3 ) ) ) + ( 56 + ( 95 - 28 ) ) )

it will split it with .split() and the program will work correctly. But I want the user to be able to introduce something like this:

((73 + ( (34- 72 ) / ( 33 -3) )) + (56 +(95 - 28) ) )

As you can see I want that the blank spaces can be trivial but the program continue splitting the string by parentheses, integers (not digits) and operands.

I try to solve it with a for but I don't know how to catch the whole number (73 , 34 ,72) instead one digit by digit (7, 3 , 3 , 4 , 7 , 2)

To sum up, what I want is split a string like ((81 * 6) /42+ (3-1)) into:

[(, (, 81, *, 6, ), /, 42, +, (, 3, -, 1, ), )]
like image 860
Fernaku Avatar asked Apr 13 '17 10:04

Fernaku


People also ask

How do you split a specific string in Python?

The split() method splits a string into a list. You can specify the separator, default separator is any whitespace. Note: When maxsplit is specified, the list will contain the specified number of elements plus one.

How do you split a string into two parts in Python?

Use Split () Function This function splits the string into smaller sections. This is the opposite of merging many strings into one. The split () function contains two parameters. In the first parameter, we pass the symbol that is used for the split.

How do you split a string using operators?

Method #1 : Using re. split() This task can be solved using the split functionality provided by Python regex library which has power to split the string according to certain conditions and in this case all numbers and operators.

How do you split a string with special characters in Python?

Use the re. split() method to split a string on all special characters. The re. split() method takes a pattern and a string and splits the string on each occurrence of the pattern.


4 Answers

Tree with ast

You could use ast to get a tree of the expression :

import ast

source = '((81 * 6) /42+ (3-1))'
node = ast.parse(source) 

def show_children(node, level=0):
    if isinstance(node, ast.Num):
        print(' ' * level + str(node.n))
    else:
        print(' ' * level + str(node))
    for child in ast.iter_child_nodes(node):
        show_children(child, level+1)

show_children(node)

It outputs :

<_ast.Module object at 0x7f56abbc5490>
 <_ast.Expr object at 0x7f56abbc5350>
  <_ast.BinOp object at 0x7f56abbc5450>
   <_ast.BinOp object at 0x7f56abbc5390>
    <_ast.BinOp object at 0x7f56abb57cd0>
     81
     <_ast.Mult object at 0x7f56abbd0dd0>
     6
    <_ast.Div object at 0x7f56abbd0e50>
    42
   <_ast.Add object at 0x7f56abbd0cd0>
   <_ast.BinOp object at 0x7f56abb57dd0>
    3
    <_ast.Sub object at 0x7f56abbd0d50>
    1

As @user2357112 wrote in the comments : ast.parse interprets Python syntax, not mathematical expressions. (1+2)(3+4) would be parsed as a function call and list comprehensions would be accepted even though they probably shouldn't be considered a valid mathematical expression.

List with a regex

If you want a flat structure, a regex could work :

import re

number_or_symbol = re.compile('(\d+|[^ 0-9])')
print(re.findall(number_or_symbol, source))
# ['(', '(', '81', '*', '6', ')', '/', '42', '+', '(', '3', '-', '1', ')', ')']

It looks for either :

  • multiple digits
  • or any character which isn't a digit or a space

Once you have a list of elements, you could check if the syntax is correct, for example with a stack to check if parentheses are matching, or if every element is a known one.

like image 125
Eric Duminil Avatar answered Oct 06 '22 21:10

Eric Duminil


You need to implement a very simple tokenizer for your input. You have the following types of tokens:

  • (
  • )
  • +
  • -
  • *
  • /
  • \d+

You can find them in your input string separated by all sorts of white space.

So a first step is to process the string from start to finish, and extract these tokens, and then do your parsing on the tokens, rather than on the string itself.

A nifty way to do this is to use the following regular expression: '\s*([()+*/-]|\d+)'. You can then:

import re

the_input='(3+(2*5))'
tokens = []
tokenizer = re.compile(r'\s*([()+*/-]|\d+)')
current_pos = 0
while current_pos < len(the_input):
  match = tokenizer.match(the_input, current_pos)
  if match is None:
     raise Error('Syntax error')
  tokens.append(match.group(1))
  current_pos = match.end()
print(tokens)

This will print ['(', '3', '+', '(', '2', '*', '5', ')', ')']

You could also use re.findall or re.finditer, but then you'd be skipping non-matches, which are syntax errors in this case.

like image 32
Horia Coman Avatar answered Oct 06 '22 20:10

Horia Coman


If you don't want to use re module, you can try this:

s="((81 * 6) /42+ (3-1))"

r=[""]

for i in s.replace(" ",""):
    if i.isdigit() and r[-1].isdigit():
        r[-1]=r[-1]+i
    else:
        r.append(i)
print(r[1:])

Output:

['(', '(', '81', '*', '6', ')', '/', '42', '+', '(', '3', '-', '1', ')', ')']
like image 5
McGrady Avatar answered Oct 06 '22 21:10

McGrady


It actual would be pretty trivial to hand-roll a simple expression tokenizer. And I'd think you'd learn more that way as well.

So for the sake of education and learning, Here is a trivial expression tokenizer implementation which can be extended. It works based upon the "maximal-much" rule. This means it acts "greedy", trying to consume as many characters as it can to construct each token.

Without further ado, here is the tokenizer:

class ExpressionTokenizer:
    def __init__(self, expression, operators):
        self.buffer = expression
        self.pos = 0
        self.operators = operators

    def _next_token(self):
        atom = self._get_atom()

        while atom and atom.isspace():
            self._skip_whitespace()
            atom = self._get_atom()

        if atom is None:
            return None
        elif atom.isdigit():
            return self._tokenize_number()
        elif atom in self.operators:
            return self._tokenize_operator()
        else:
            raise SyntaxError()

    def _skip_whitespace(self):
        while self._get_atom():
            if self._get_atom().isspace():
                self.pos += 1
            else:
                break

    def _tokenize_number(self):
        endpos = self.pos + 1
        while self._get_atom(endpos) and self._get_atom(endpos).isdigit():
            endpos += 1
        number = self.buffer[self.pos:endpos]
        self.pos = endpos
        return number

    def _tokenize_operator(self):
        operator = self.buffer[self.pos]
        self.pos += 1
        return operator

    def _get_atom(self, pos=None):
        pos = pos or self.pos
        try:
            return self.buffer[pos]
        except IndexError:
            return None

    def tokenize(self):
        while True:
            token = self._next_token()
            if token is None:
                break
            else:
                yield token

Here is a demo the usage:

tokenizer = ExpressionTokenizer('((81 * 6) /42+ (3-1))', {'+', '-', '*', '/', '(', ')'})
for token in tokenizer.tokenize():
    print(token)

Which produces the output:

(
(
81
*
6
)
/
42
+
(
3
-
1
)
)
like image 5
Christian Dean Avatar answered Oct 06 '22 19:10

Christian Dean