Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular Expression for Extracting Operands from Mathematical Expression

No question on SO addresses my particular problem. I know very little about regular expression. I am building an expression parser in Java using Regex Class for that purpose. I want to extract Operands, Arguments, Operators, Symbols and Function Names from expression and then save to ArrayList. Currently I am using this logic

String string = "2!+atan2(3+9,2+3)-2*PI+3/3-9-12%3*sin(9-9)+(2+6/2)" //This is just for testing purpose later on it will be provided by user
List<String> res = new ArrayList<>();
Pattern pattern = Pattern.compile((\\Q^\\E|\\Q/\\E|\\Q-\\E|\\Q-\\E|\\Q+\\E|\\Q*\\E|\\Q)\\E|\\Q)\\E|\\Q(\\E|\\Q(\\E|\\Q%\\E|\\Q!\\E)) //This string was build in a function where operator names were provided. Its mean that user can add custom operators and custom functions 
Matcher m = pattern.matcher(string);
int pos = 0;
while (m.find()) 
{
    if (pos != m.start()) 
    {
        res.add(string.substring(pos, m.start()))
    }
    res.add(m.group())
    pos = m.end();
}
if (pos != string.length()) 
{
     addToTokens(res, string.substring(pos));
}
for(String s : res)
{
     System.out.println(s);
}

Output:

2
!
+
atan2
(
3
+
9
,
2
+
3
)
-
2
*
PI
+
3
/
3
-
9
-
12
%
3
*
sin
(
9
-
9
)
+
(
2
+
6
/
2
)

Problem is that now Expression can contain Matrix with user defined format. I want to treat every Matrix as a Operand or Argument in case of functions.

Input 1:

String input_1 = "2+3-9*[{2+3,2,6},{7,2+3,2+3i}]+9*6"

Output Should be:

2
+
3
-
9
*
[{2+3,2,6},{7,2+3,2+3i}]
+
9
*
6

Input 2:

String input_2 = "{[2,5][9/8,func(2+3)]}+9*8/5"

Output Should be:

{[2,5][9/8,func(2+3)]}
+
9
*
8
/
5

Input 3:

String input_3 = "<[2,9,2.36][2,3,2!]>*<[2,3,9][23+9*8/8,2,3]>"

Output Should be:

<[2,9,2.36][2,3,2!]>
*
<[2,3,9][23+9*8/8,2,3]>

I want that now ArrayList should contain every Operand, Operators, Arguments, Functions and symbols at each index. How can I achieve my desired output using Regular expression. Expression validation is not required.

like image 621
Kamil Mahmood Avatar asked Oct 11 '15 20:10

Kamil Mahmood


2 Answers

I think you can try with something like:

(?<matrix>(?:\[[^\]]+\])|(?:<[^>]+>)|(?:\{[^\}]+\}))|(?<function>\w+(?=\())|(\d+[eE][-+]\d+)|(?<operand>\w+)|(?<operator>[-+\/*%])|(?<symbol>.)

DEMO

elements are captured in named capturing groups. If you don't need it, you can use short:

\[[^\]]+\]|<[^>]+>|\{[^\}]+\}|\d+[eE][-+]\d+|\w+(?=\()|\w+|[-+\/*%]|.


The \[[^\]]+\]|<[^>]+>|\{[^\}]+\} match opening bracket ({, [ or <), non clasing bracket characters, and closing bracket (},],>) so if there are no nested same-type brackets, there is no problem. Implementatin in Java:

public class Test {
    public static void main(String[] args) {
        String[] expressions = {"2!+atan2(3+9,2+3)-2*PI+3/3-9-12%3*sin(9-9)+(2+6/2)", "2+3-9*[{2+3,2,6},{7,2+3,2+3i}]+9*6",
        "{[2,5][9/8,func(2+3)]}+9*8/5","<[2,9,2.36][2,3,2!]>*<[2,3,9][23 + 9 * 8 / 8, 2, 3]>"};
        Pattern pattern = Pattern.compile("(?<matrix>(?:\\[[^]]+])|(?:<[^>]+>)|(?:\\{[^}]+}))|(?<function>\\w+(?=\\())|(?<operand>\\w+)|(?<operator>[-+/*%])|(?<symbol>.)");
        for(String expression : expressions) {
            List<String> elements = new ArrayList<String>();
            Matcher matcher = pattern.matcher(expression);
            while (matcher.find()) {
                elements.add(matcher.group());
            }
            for (String element : elements) {
                System.out.println(element);
            }
            System.out.println("\n\n\n");
        }
    }
}

Explanation of alternatives:

  • \[[^\]]+\]|<[^>]+>|\{[^\}]+\} - match opening bracket of given type, character which are not closing bracket of that type (everything byt not closing bracket), and closing bracket of that type,
  • \d+[eE][-+]\d+ = digit, followed by e or E, followed by operator + or -, followed by digits, to capture elements like 2e+3
  • \w+(?=\() - match one or more word characters (A-Za-z0-9_) if it is followed by ( for matching functions like sin,
  • \w+ - match one or more word characters (A-Za-z0-9_) for matching operands,
  • [-+\/*%] - match one character from character class, to match operators
  • . - match any other character, to match other symbols

Order of alternatives is quite important, as last alternative . will match any character, so it need to be last option. Similar case with \w+(?=\() and \w+, the second one will match everything like previous one, however if you don't wont to distinguish between functions and operands, the \w+ will be enough for all of them.

In longer exemple the part (?<name> ... ) in every alternative, is a named capturing group, and you can see in demo, how it group matched fragments in gorups like: operand, operator, function, etc.

like image 192
m.cekiera Avatar answered Sep 24 '22 15:09

m.cekiera


With regular expressions you cannot match any level of nested balanced parentheses.

For example, in your second example {[2,5][9/8,func(2+3)]} you need to match the opening brace with the close brace, but you need to keep track of how many opening and closing inner braces/parens/etc there are. That cannot be done with regular expressions.

If, on the other hand, you simplify your problem to remove any requirement for balancing, then you probably can handle with regular expressions.

like image 32
Jonah Graham Avatar answered Sep 23 '22 15:09

Jonah Graham