I made a program which convert infix to postfix in python. The problem is when I introduce the arguments. If i introduce something like this: (this will be a string) <pre class="prettyprint"><code>( ( 73 + ( ( 34 - 72 ) / ( 33 - 3 ) ) ) + ( 56 + ( 95 - 28 ) ) ) </code></pre> it will split it with .split() and the program will work correctly. But I want the user to be able to introduce something like this: <pre class="prettyprint"><code>((73 + ( (34- 72 ) / ( 33 -3) )) + (56 +(95 - 28) ) ) </code></pre> As you can see I want that the blank spaces can be trivial but the program continue splitting the string by parentheses, integers (not digits) and operands. I try to solve it with a <code>for</code> but I don't know how to catch the whole number (73 , 34 ,72) instead one digit by digit (7, 3 , 3 , 4 , 7 , 2) To sum up, what I want is split a string like <code>((81 * 6) /42+ (3-1))</code> into: <pre class="prettyprint"><code>[(, (, 81, *, 6, ), /, 42, +, (, 3, -, 1, ), )] </code></pre>

<h3>Tree with <code>ast</code> </h3> You could use <code>ast</code> to get a tree of the expression : <pre class="prettyprint"><code>import ast source = '((81 * 6) /42+ (3-1))' node = ast.parse(source) def show_children(node, level=0): if isinstance(node, ast.Num): print(' ' * level + str(node.n)) else: print(' ' * level + str(node)) for child in ast.iter_child_nodes(node): show_children(child, level+1) show_children(node) </code></pre> It outputs : <pre class="prettyprint"><code><_ast.Module object at 0x7f56abbc5490> <_ast.Expr object at 0x7f56abbc5350> <_ast.BinOp object at 0x7f56abbc5450> <_ast.BinOp object at 0x7f56abbc5390> <_ast.BinOp object at 0x7f56abb57cd0> 81 <_ast.Mult object at 0x7f56abbd0dd0> 6 <_ast.Div object at 0x7f56abbd0e50> 42 <_ast.Add object at 0x7f56abbd0cd0> <_ast.BinOp object at 0x7f56abb57dd0> 3 <_ast.Sub object at 0x7f56abbd0d50> 1 </code></pre> As @user2357112 wrote in the comments : <code>ast.parse</code> interprets Python syntax, not mathematical expressions. <code>(1+2)(3+4)</code> would be parsed as a function call and list comprehensions would be accepted even though they probably shouldn't be considered a valid mathematical expression. <h3>List with a regex</h3> If you want a flat structure, a regex could work : <pre class="prettyprint"><code>import re number_or_symbol = re.compile('(\d+|[^ 0-9])') print(re.findall(number_or_symbol, source)) # ['(', '(', '81', '*', '6', ')', '/', '42', '+', '(', '3', '-', '1', ')', ')'] </code></pre> It looks for either : <ul> <li>multiple digits</li> <li>or any character which isn't a digit or a space</li> </ul> Once you have a list of elements, you could check if the syntax is correct, for example with a <code>stack</code> to check if parentheses are matching, or if every element is a known one.

You need to implement a very simple tokenizer for your input. You have the following types of tokens: <ul> <li>(</li> <li>)</li> <li>+</li> <li>-</li> <li>*</li> <li>/</li> <li>\d+</li> </ul> You can find them in your input string separated by all sorts of white space. So a first step is to process the string from start to finish, and extract these tokens, and then do your parsing on the tokens, rather than on the string itself. A nifty way to do this is to use the following regular expression: <code>'\s*([()+*/-]|\d+)'</code>. You can then: <pre class="prettyprint"><code>import re the_input='(3+(2*5))' tokens = [] tokenizer = re.compile(r'\s*([()+*/-]|\d+)') current_pos = 0 while current_pos < len(the_input): match = tokenizer.match(the_input, current_pos) if match is None: raise Error('Syntax error') tokens.append(match.group(1)) current_pos = match.end() print(tokens) </code></pre> This will print <code>['(', '3', '+', '(', '2', '*', '5', ')', ')']</code> You could also use <code>re.findall</code> or <code>re.finditer</code>, but then you'd be skipping non-matches, which are syntax errors in this case.

How can I split a string of a mathematical expressions in python?

Tags:

python

string

split

python-3.x

tokenize

I made a program which convert infix to postfix in python. The problem is when I introduce the arguments. If i introduce something like this: (this will be a string)

Click to copy

( ( 73 + ( ( 34 - 72 ) / ( 33 - 3 ) ) ) + ( 56 + ( 95 - 28 ) ) )

it will split it with .split() and the program will work correctly. But I want the user to be able to introduce something like this:

Click to copy

((73 + ( (34- 72 ) / ( 33 -3) )) + (56 +(95 - 28) ) )

As you can see I want that the blank spaces can be trivial but the program continue splitting the string by parentheses, integers (not digits) and operands.

I try to solve it with a for but I don't know how to catch the whole number (73 , 34 ,72) instead one digit by digit (7, 3 , 3 , 4 , 7 , 2)

To sum up, what I want is split a string like ((81 * 6) /42+ (3-1)) into:

Click to copy

[(, (, 81, *, 6, ), /, 42, +, (, 3, -, 1, ), )]

860

asked Apr 13 '17 10:04

Fernaku

4 Answers

Tree with `ast`

You could use ast to get a tree of the expression :

Click to copy

import ast

source = '((81 * 6) /42+ (3-1))'
node = ast.parse(source) 

def show_children(node, level=0):
    if isinstance(node, ast.Num):
        print(' ' * level + str(node.n))
    else:
        print(' ' * level + str(node))
    for child in ast.iter_child_nodes(node):
        show_children(child, level+1)

show_children(node)

It outputs :

Click to copy

<_ast.Module object at 0x7f56abbc5490>
 <_ast.Expr object at 0x7f56abbc5350>
  <_ast.BinOp object at 0x7f56abbc5450>
   <_ast.BinOp object at 0x7f56abbc5390>
    <_ast.BinOp object at 0x7f56abb57cd0>
     81
     <_ast.Mult object at 0x7f56abbd0dd0>
     6
    <_ast.Div object at 0x7f56abbd0e50>
    42
   <_ast.Add object at 0x7f56abbd0cd0>
   <_ast.BinOp object at 0x7f56abb57dd0>
    3
    <_ast.Sub object at 0x7f56abbd0d50>
    1

As @user2357112 wrote in the comments : ast.parse interprets Python syntax, not mathematical expressions. (1+2)(3+4) would be parsed as a function call and list comprehensions would be accepted even though they probably shouldn't be considered a valid mathematical expression.

List with a regex

If you want a flat structure, a regex could work :

Click to copy

import re

number_or_symbol = re.compile('(\d+|[^ 0-9])')
print(re.findall(number_or_symbol, source))
# ['(', '(', '81', '*', '6', ')', '/', '42', '+', '(', '3', '-', '1', ')', ')']

It looks for either :

multiple digits
or any character which isn't a digit or a space

Once you have a list of elements, you could check if the syntax is correct, for example with a stack to check if parentheses are matching, or if every element is a known one.

125

answered Oct 06 '22 21:10

Eric Duminil

You need to implement a very simple tokenizer for your input. You have the following types of tokens:

(
)
+
-
*
/
\d+

You can find them in your input string separated by all sorts of white space.

So a first step is to process the string from start to finish, and extract these tokens, and then do your parsing on the tokens, rather than on the string itself.

A nifty way to do this is to use the following regular expression: '\s*([()+*/-]|\d+)'. You can then:

Click to copy

import re

the_input='(3+(2*5))'
tokens = []
tokenizer = re.compile(r'\s*([()+*/-]|\d+)')
current_pos = 0
while current_pos < len(the_input):
  match = tokenizer.match(the_input, current_pos)
  if match is None:
     raise Error('Syntax error')
  tokens.append(match.group(1))
  current_pos = match.end()
print(tokens)

This will print ['(', '3', '+', '(', '2', '*', '5', ')', ')']

You could also use re.findall or re.finditer, but then you'd be skipping non-matches, which are syntax errors in this case.

answered Oct 06 '22 20:10

Horia Coman

If you don't want to use re module, you can try this:

Click to copy

s="((81 * 6) /42+ (3-1))"

r=[""]

for i in s.replace(" ",""):
    if i.isdigit() and r[-1].isdigit():
        r[-1]=r[-1]+i
    else:
        r.append(i)
print(r[1:])

Output:

Click to copy

['(', '(', '81', '*', '6', ')', '/', '42', '+', '(', '3', '-', '1', ')', ')']

answered Oct 06 '22 21:10

McGrady

It actual would be pretty trivial to hand-roll a simple expression tokenizer. And I'd think you'd learn more that way as well.

So for the sake of education and learning, Here is a trivial expression tokenizer implementation which can be extended. It works based upon the "maximal-much" rule. This means it acts "greedy", trying to consume as many characters as it can to construct each token.

Without further ado, here is the tokenizer:

Click to copy

class ExpressionTokenizer:
    def __init__(self, expression, operators):
        self.buffer = expression
        self.pos = 0
        self.operators = operators

    def _next_token(self):
        atom = self._get_atom()

        while atom and atom.isspace():
            self._skip_whitespace()
            atom = self._get_atom()

        if atom is None:
            return None
        elif atom.isdigit():
            return self._tokenize_number()
        elif atom in self.operators:
            return self._tokenize_operator()
        else:
            raise SyntaxError()

    def _skip_whitespace(self):
        while self._get_atom():
            if self._get_atom().isspace():
                self.pos += 1
            else:
                break

    def _tokenize_number(self):
        endpos = self.pos + 1
        while self._get_atom(endpos) and self._get_atom(endpos).isdigit():
            endpos += 1
        number = self.buffer[self.pos:endpos]
        self.pos = endpos
        return number

    def _tokenize_operator(self):
        operator = self.buffer[self.pos]
        self.pos += 1
        return operator

    def _get_atom(self, pos=None):
        pos = pos or self.pos
        try:
            return self.buffer[pos]
        except IndexError:
            return None

    def tokenize(self):
        while True:
            token = self._next_token()
            if token is None:
                break
            else:
                yield token

Here is a demo the usage:

Click to copy

tokenizer = ExpressionTokenizer('((81 * 6) /42+ (3-1))', {'+', '-', '*', '/', '(', ')'})
for token in tokenizer.tokenize():
    print(token)

Which produces the output:

Click to copy

(
(
81
*
6
)
/
42
+
(
3
-
1
)
)

answered Oct 06 '22 19:10

Christian Dean

Related questions
                            
                                identifying objects, why does the returned value from id(...) change?
                            
                                python regex to replace all windows newlines with spaces
                            
                                how to make web framework based on Python like django? [closed]
                            
                                Scrapy read list of URLs from file to scrape?
                            
                                Readonly text field in Flask-Admin ModelView
                            
                                TypeError: 'numpy.float64' object is not callable
                            
                                How to subtract two images using python opencv2 to get the foreground object
                            
                                Append a Header for CSV file?
                            
                                How do I mention a user in discord.py?
                            
                                spyder - clear variable explorer along with variables from memory
                            
                                Using list comprehension in Python to do something similar to zip()?
                            
                                Is there a standard class for an infinitely nested defaultdict?
                            
                                Subprocess Variables [duplicate]
                            
                                Correlation of Two Variables in a Time Series in Python?
                            
                                Choosing a file in Python3
                            
                                Python Google Maps Driving Time
                            
                                How to check if a variable matches any item in list using the any() function?
                            
                                conditional breakpoint using pdb
                            
                                Django 1.7 removing Add button from inline form
                            
                                Pandas: Selecting rows based on value counts of a particular column

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I split a string of a mathematical expressions in python?

Tags:

python

string

split

python-3.x

tokenize

Fernaku

People also ask

4 Answers

Tree with `ast`

List with a regex

Eric Duminil

Horia Coman

McGrady

Christian Dean

Recent Activity

Donate For Us

How can I split a string of a mathematical expressions in python?

Tags:

python

string

split

python-3.x

tokenize

Fernaku

People also ask

4 Answers

Tree with ast

List with a regex

Eric Duminil

Horia Coman

McGrady

Christian Dean

Related questions

Recent Activity

Donate For Us

Tree with `ast`