Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generating all possible "unique" RPN (Reverse Polish notation) expressions

I want to generate in Python all possible RPN (Reverse Polish notation) expressions, that use letters from an input list (such as ['a', 'b', 'c']) and contain operators ['+', '-', '*', '/'].

My idea was that we can add elements to the current expression until one of the following happens: either we've used all letters or expression is complete (i.e. we can't add more operators).

So I wrote the following functions:

1)

'''
The function returns True if we can add operator to current expression:
we scan the list and add +1 to counter when we meet a letter
and we add -1 when we meet an operator (it reduces
last two letters into 1 (say ab+ <--> a + b)
''' 
def can_add_operator(string_):
    n = 0
    for letter in string_:
        if letter not in ['+', '-', '*', '/']:
            n += 1
        else:
            n -= 1
    result = n > 1
    return result


    can_add_operator('ab*c+')) #False
    can_add_operator('ab*c')  #True

2)

'''
The function returns a list, that contains operators and
letters, one of which we can add to the current expression.
'''

def possible_elements(items, string_):
    #list of all letters that we have not used yet
    result =  [x for x in items if x not in string_]
    if can_add_operator(string_):
        result = ["+", "-", "*", "/"] + result
    return result

letters = ['a', 'b', 'c', 'd']
possible_elements(letters, 'ab*c') #['+', '-', '*', '/', 'd']
possible_elements(letters, '') #['a', 'b', 'c', 'd']
possible_elements(letters, 'ab*c+d+') #[]

3) Next I wrap it into recursion:

#exp -- current expression, base of recursion is exp = ''
def rec(exp, Final_sol = []):
    elements_to_try = possible_elements(letters, exp)
    for i in elements_to_try:
        if len(possible_elements(letters, exp + i)) == 0:
            Final_sol.append(exp + i)
        else:
            rec(exp+i, Final_sol)
    return Final_sol

#we start with an empty string
Final_sol = rec('')
print(len(Final_sol)) #7680

There are two difficulties with that function:

  1. The first one is that if there are repeated letters in list of letters, it won't return all possible results.

    For example if letters = ['a', 'b', 'a']:

    Final_sol = rec('')
    print(len(Final_sol)) #32
    str(Final_sol)
    >> "['ab+', 'ab-', 'ab*', 'ab/', 'ba+', 'ba-', 'ba*', 'ba/', 'ba+', 'ba-', 
    'ba*', 'ba/', 'ab+', 'ab-', 'ab*', 'ab/', 'ab+', 'ab-', 'ab*', 'ab/', 'ba+', 
    'ba-', 'ba*', 'ba/', 'ba+', 'ba-', 'ba*', 'ba/', 'ab+', 'ab-', 'ab*', 
     'ab/']"
    

    So the output missing 'ab+a+' and so on. But I do want to return all possible combinations in that case too.

  2. The second problem is that there are many "equivalent" strings in the output. Since we have the commutative and associative properties in prefix form, expressions like ab+c+/abc++/ca+b+ should be considered as equivalent: I want only one of each group in the output of the function rec().

How could we change the functions above to overcome such difficulties? What is the most elegant solution to problem?

like image 205
Dima Avatar asked Jan 27 '19 00:01

Dima


People also ask

Which expressions are also regarded as Reverse Polish Notation RPN?

Reverse Polish notation (RPN), also known as reverse Łukasiewicz notation, Polish postfix notation or simply postfix notation, is a mathematical notation in which operators follow their operands, in contrast to Polish notation (PN), in which operators precede their operands.

What is meant by reverse Polish notation RPN in stack?

Reverse Polish notation (RPN) is a method for representing expressions in which the operator symbol is placed after the arguments being operated on. Polish notation, in which the operator comes before the operands, was invented in the 1920s by the Polish mathematician Jan Lucasiewicz.

Why is reverse Polish notation RPN used to carry out the evaluation of expressions?

Reverse Polish notation (RPN) is a method for conveying mathematical expressions without the use of separators such as brackets and parentheses. In this notation, the operators follow their operands, hence removing the need for brackets to define evaluation priority.


2 Answers

The first one is that if there are repeated letters in list of letters, it won't return all possible results.

We can attack this problem by using a different approach to generate the permutations:

from itertools import permutations

variables = ['a', 'a', 'b', 'c']

operators = ['+', '-', '*', '/']

equations = set()

for permutation in permutations(variables):
    a, b, *rest = permutation

    operations = permutations(operators)

    for permutation in operations:

        equation = zip([a + b, *rest], permutation)

        equations.add("".join(variable + operator for variable, operator in equation))

Using a set() will eliminate any duplications caused by repeated variables.

The second problem is that there are many "equivalent" strings in the output. Since we have the commutative and associative properties

To deal with the commutative issue, we'll use pattern matching to reduce the equations:

import sys
import re

DEBUG = True

remove = set()

# Reduce commutative equivalents: ca*a-b/ same as ac*a-b/
if DEBUG:
    print("Reduce commutative equivalents:", file=sys.stderr)

for equation in equations:
    if equation not in remove:
        for match in re.finditer(r"(?=(.+)(\w)[+*])", equation):

            a, _ = match.span(1)
            _, d = match.span(2)

            equivalent = equation[:a] + match[2] + match[1] + equation[d:]

            if equivalent != equation and equivalent in equations:
                remove.add(equivalent)
                if DEBUG:
                    print(f"Removed {equivalent} same as {equation}", file=sys.stderr)

equations -= remove

Because we've built all equations as ab op c op d op, etc. I don't believe we generate the associative equivalents, but if we did, we could use a similar technique to try to thin them out:

remove = set()

# Reduce associative equivalents aa+b*c- same as ab*ab*+c-
if DEBUG:
    print("Reduce associative equivalents:", file=sys.stderr)

for equation in equations:
    if equation not in remove:
        for match in re.finditer(r"(?=(\w)([+])(\w)([*]))", equation):

            a, _ = match.span(1)
            _, d = match.span(4)

            equivalent = equation[:a] + match[3] + match[4] + match[1] + match[3] + match[4] + match[2] + equation[d:]

            if equivalent != equation and equivalent in equations:
                remove.add(equivalent)
                if DEBUG:
                    print(f"Removed {equivalent} same as {equation}", file=sys.stderr)

equations -= remove

And finally dump out our reduced set:

if DEBUG:
    print("Final equations:", file=sys.stderr)

print(equations)

OUTPUT

> python3 test.py
Reduce commutative equivalents:
Removed ac+a-b/ same as ca+a-b/
Removed ab*a/c- same as ba*a/c-
Removed cb*a/a- same as bc*a/a-
Removed ac+b-a/ same as ca+b-a/
Removed ba+c/a- same as ab+c/a-
Removed ba+a-c/ same as ab+a-c/
Removed ac+a/b- same as ca+a/b-
Removed ac+b/a- same as ca+b/a-
Removed ac*b-a/ same as ca*b-a/
Removed bc*a-a/ same as cb*a-a/
Removed ca*a-b/ same as ac*a-b/
Removed ba*a-c/ same as ab*a-c/
Removed cb+a/a- same as bc+a/a-
Removed ba+c-a/ same as ab+c-a/
Removed ca*a/b- same as ac*a/b-
Removed ca*b/a- same as ac*b/a-
Removed ba+a/c- same as ab+a/c-
Removed ab*c-a/ same as ba*c-a/
Removed ab*c/a- same as ba*c/a-
Removed cb+a-a/ same as bc+a-a/
Reduce associative equivalents:
Final equations:
{'ca+a-b/', 'cb*a+a-', 'aa/b-c*', 'ba/c-a*', 'cb/a-a*', 'ab+a*c/', 'aa/c+b-',
'bc/a-a+', 'aa*b+c-', 'ba*a/c-', 'ab+c/a*', 'ca-a/b+', 'ca-b+a*', 'bc*a/a-',
'bc/a+a*', 'ac+a/b*', 'bc+a*a-', 'ca/a-b+', 'ac-a*b+', 'ba-a*c/', 'ac/b-a*',
'ba-c+a*', 'ba+a-c*', 'aa+b/c-', 'ca-b*a/', 'ca+b-a/', 'ab+c/a-', 'ac*b+a-',
'aa+c-b/', 'aa*c/b-', 'ab/c*a+', 'ac+b/a*', 'aa+b*c/', 'ab-a*c+', 'ac+a-b*',
'cb-a+a*', 'cb*a/a+', 'ab-c/a+', 'ac*b+a/', 'ba*c/a+', 'ba/c+a*', 'aa-b*c+',
'aa/b+c*', 'ab-c*a+', 'ac+a*b/', 'ac/b+a-', 'aa*b-c+', 'ac-a+b/', 'aa-c*b+',
'ab+a-c/', 'aa-c+b/', 'ba+c*a/', 'ca-b*a+', 'ab-a/c*', 'aa-b/c+', 'ac*a+b/',
'ba/a+c-', 'ba-c/a+', 'cb/a+a*', 'ca+b/a*', 'aa/c*b+', 'ac-a+b*', 'ba-a+c*',
'ca+a*b/', 'aa+b/c*', 'aa/c-b+', 'bc*a/a+', 'ca+a/b-', 'ca+b/a-', 'ca*b-a/',
'ac/b*a-', 'aa*b/c+', 'ba/a*c+', 'bc/a*a+', 'ca-b+a/', 'ac/b+a*', 'aa*b/c-',
'bc-a+a/', 'ca/b-a*', 'ba-c*a/', 'cb*a-a/', 'ba-c/a*', 'aa*b+c/', 'ac*a-b/',
'ca*b/a+', 'aa+b-c*', 'ba/a-c*', 'ca-b/a+', 'ab/c-a+', 'cb+a/a*', 'aa-c/b*',
'ba+c*a-', 'cb*a+a/', 'aa*c/b+', 'ab/c+a*', 'ca+b-a*', 'aa+b-c/', 'ac-b*a/',
'ab*a-c/', 'ba-a*c+', 'ba*c+a-', 'bc/a*a-', 'ba*c-a+', 'ba/c*a+', 'ab-c+a/',
'ba*c+a/', 'ca*a-b+', 'bc+a/a-', 'aa+c*b-', 'ab+c*a-', 'ac-a/b+', 'ca+a-b*',
'aa+c-b*', 'ab/c*a-', 'ab+c-a/', 'bc+a/a*', 'ac-a/b*', 'ab/a-c*', 'ac/a-b+',
'bc-a/a+', 'ab+a*c-', 'ac/a-b*', 'ca*a+b-', 'ab/a-c+', 'ab-a*c/', 'cb/a*a-',
'ac/a+b*', 'bc-a/a*', 'ac-b+a*', 'ac*a/b-', 'ba*a+c-', 'ba/a-c+', 'bc/a+a-',
'aa/b-c+', 'cb+a-a*', 'ca-b/a*', 'ca+b*a-', 'ac*b/a-', 'ca-a+b/', 'ca/b*a-',
'ba+a/c*', 'cb-a*a+', 'ac+a*b-', 'aa*b-c/', 'aa*c-b/', 'ac/a*b+', 'aa-c+b*',
'ca*a+b/', 'ca/b+a-', 'ac*a/b+', 'aa+c/b-', 'ab/c+a-', 'ab+a/c-', 'cb-a+a/',
'ab*a-c+', 'ab-a+c*', 'ab+a/c*', 'ac/b-a+', 'ab*c+a/', 'ba/c+a-', 'ba/c*a-',
'cb-a*a/', 'ac+b*a-', 'ba+c-a*', 'ac/b*a+', 'cb/a*a+', 'cb-a/a+', 'bc*a+a/',
'ac*b/a+', 'cb+a*a-', 'ba*c-a/', 'ca-a*b/', 'ca-a*b+', 'ab/a*c-', 'ba-a+c/',
'ba*a/c+', 'bc-a+a*', 'ca+a/b*', 'ca*a/b+', 'aa*c+b-', 'ba*c/a-', 'bc/a-a*',
'ca/a+b*', 'ab-a+c/', 'ca/b*a+', 'ab-a/c+', 'cb*a-a+', 'aa-b/c*', 'ac-b/a+',
'aa*c-b+', 'ab*c+a-', 'cb/a-a+', 'ab/a+c*', 'ba+a*c-', 'ba*a+c/', 'ba-a/c*',
'aa/b+c-', 'ba/c-a+', 'ca/b-a+', 'ab*a/c+', 'bc+a-a*', 'bc*a-a+', 'ab+c*a/',
'ab-c*a/', 'ac*a+b-', 'ca/a+b-', 'ac/a*b-', 'ac+b-a*', 'ba/a+c*', 'ba-a/c+',
'ab*c/a+', 'cb/a+a-', 'ca/a-b*', 'ac-b/a*', 'ab/a*c+', 'ca*b+a/', 'ac-a*b/',
'aa/b*c+', 'aa/c-b*', 'ca/a*b+', 'bc-a*a/', 'ca+b*a/', 'aa*c+b/', 'ab*a+c/',
'bc+a*a/', 'ab-c/a*', 'ca-a+b*', 'aa-c*b/', 'cb-a/a*', 'aa+b*c-', 'ca+a*b-',
'aa-b+c*', 'ac/a+b-', 'ba-c+a/', 'ba-c*a+', 'ca*b-a+', 'ac-b+a/', 'aa-b*c/',
'aa-b+c/', 'ac*a-b+', 'ac+b*a/', 'ca/a*b-', 'bc+a-a/', 'bc-a*a+', 'ba+a*c/',
'ac*b-a+', 'aa/c+b*', 'ab/a+c-', 'ab/c-a*', 'ab-c+a*', 'ba+c/a*', 'ab*c-a+',
'ab+a-c*', 'cb+a*a/', 'ac-b*a+', 'ba/a*c-', 'ab*a+c-', 'ab+c-a*', 'bc*a+a-',
'aa/b*c-', 'ca*b+a-', 'ba*a-c+', 'ca/b+a*', 'aa-c/b+', 'aa+c/b*', 'ca-a/b*',
'aa/c*b-', 'aa+c*b/'}
> 

I'm not claiming a perfect solution, just illustrating some of the tools available to you to solve your problem.

like image 55
cdlane Avatar answered Sep 16 '22 18:09

cdlane


To create all the possible expressions we can consider every expression a binary expression tree and then the notation will be just a matter of traversing the tree differently. For example:

tree:                          *
                              / \
             +               -   c
            / \             / \
           a   b           a   b

infix:     a + b          (a - b) * c
postfix    a b +           a b - c *

Since all the required operators are binary, the resulting expression trees are full binary trees, meaning all non-leaf nodes have exactly two children. Another property of binary expression trees is that all operands are the leaves of the tree and all internal nodes are operators, and the number of internal nodes (operators) is one less than the number of the leaves (operands).

Now to create all the possible expressions, first we need all the structurally distinct full binary trees with len(operands) leaves or len(operands)-1 internal nodes.

I use a generator written by the answerer of this question: generate all structurally distinct full binary trees with n leaves.

The code below generates all structurally distinct full binary trees with n leaves. It outputs the tree structure with some notation that you can set in the function. This one is set to show subtrees wrapped in parentheses, operands as x and operators as o. For example for 2 operators and 3 operands:

(xo(xox))       ((xox)ox)
    o               o
   / \             / \
  x   o           o   x
     / \         / \
    x   x       x   x
from itertools import product

def expr_trees(n):
    if n == 1:
        yield 'x'

    for i in range(1, n):
        left = expr_trees(i)
        right = expr_trees(n-i)

        for l, r in product(left, right):
            yield '('+l+'o'+r+')'

for t in expr_trees(3):
    print(t)

Now to generate all the possible expressions we need to place all the permutations without repetition of operands on the leaves and all the permutations of length len(operands)-1 of operators with repetition, at the internal nodes of every tree structure. Here we alter the generator function to use the list of operators and operands and output postfix expressions:

from itertools import permutations, product

def expressions(opds, oprs, idx):
    if len(opds) == 1:
        yield opds[0]

    for i in range(1, len(opds)):
        left = expressions(opds[0:i], oprs, idx+1)

        right = expressions(opds[i:], oprs, idx+1)

        for l, r in product(left, right):
            yield l+r+oprs[idx]

operands = ['a', 'b', 'c']
operators = ['+', '-', '*', '/']

operatorProducts = product(operators, repeat=len(operands)-1)
operandPermutations = permutations(operands)

for opds, oprs in product(operandPermutations, operatorProducts):
    for t in expressions(opds, oprs, 0):
        print(t)

Now about the time complexity. As an example let's calculate the number of all structurally distinct expressions for ['a', 'b', 'c'].

As we saw earlier there are two full binary trees for three operands. The number of permutations of the operands is 3! = 6 and the number of permutations of operators is 4^2 because we choose 2 out of 4 with repetition allowed. Therefore we have:

number of expressions
    = number of trees * number of operand permutations * number of operator permutations
    = 2 * 6 * 16
    = 192

For the general formula the interesting part is the number of structurally distinct binary trees which is the nth Catalan number with n being the number of internal nodes of the tree. You can read more about it at the answer to Counting binary trees.

number of trees with n internal nodes = (1 / n+1) x (2n)! / (n! x n!)

Therefore the number of structurally distinct expressions with n operators or n+1 operands:

(n+1)! x 4^n x (1/n+1) x (2n)! / (n! x n!) = 4^n x (2n)! / n!

(excuse ugly math formulas because of lack of support here. x is multiplication. You can find nicer formatting at the links above.)

Note that n is the number operators or the number of operands - 1.

As you can see the number of possible expressions grows extremely fast with n.

1, 8, 192, 7680, 430080, 30965760, ...

Although there are many equivalent expressions, still they are a small portion of all the expressions and you should think of a practical limit for the number of operands.

That brings us to the next problem which is finding equivalent expressions. It might seem simple at first as one might think it's only about commutative property of + and * but there are cases of - and / changing the rest of the expression in complicated ways which is difficult to catch by just a simple RegExp, IMO. For example abc-- is equivalent of ab-c+ because of unary effect of minus on the parenthesized elements and a more complicated version with the inversion effect of division, abcde+-*/ which is equivalent of abcd-e-//. Adding repeated elements to the list of operands creates more equivalent expressions and makes it even more difficult to catch them all.

I find it much complicated to find all the equivalent expressions and in my opinion your best bet is to implement a function that expands, simplifies and sorts all the terms so that you have a simplified version of each group of equivalent expressions for comparison.

like image 23
jal Avatar answered Sep 19 '22 18:09

jal