I need to split strings containing basic mathematical expressions, such as:"(a+b)*c"
or" (a - c) / d"
The delimiters are + - * / ( ) and space and i need them as an independent token.
Basically the result should look like this:
"("
"a"
"+"
"b"
")"
"*"
"c"
And for the second example:
" "
"("
"a"
" "
"-"
...
I read a lot of questions about similar problems with less complex delimiters and the common answer was to use zero space positive lookahead and -behind.
Like this: (?<=X | ?=X)
And X represents the delimiters, but putting them in a class like this:[\\Q+-*()\\E/\\s]
does not work in the desired way.
So how do i have to format the delimiters to make the split work how i need it?
---Update---
Word class characters and longer combinations should not be splitted.
Such as "ab" "c1" or "12".
Or in short, I need the same result as the StringTokenizer would have, give the parameters "-+*/() " and true.
Try splitting your data using
yourString.split("(?<=[\\Q+-*()\\E/\\s])|(?=[\\Q+-*()\\E/\\s])(?<!^)"));
I assume that problem you had was not in \\Q+-*()\\E
part but in (?<=X | ?=X)
<- it should be (?<=X)|(?=X)
since it should produce look-behind and look-ahead.
demo for "_a+(ab-c1__)+12_"
(BTW _
will be replaced with space in code. SO shows two spaces as one, so had to use __
to present them somehow)
String[] tokens = " a+(ab-c1 )+12 "
.split("(?<=[\\Q+-*()\\E/\\s])|(?=[\\Q+-*()\\E/\\s])(?<!^)");
for (String token : tokens)
System.out.println("\"" + token + "\"");
result
" "
"a"
"+"
"("
"ab"
"-"
"c1"
" "
" "
")"
"+"
"12"
" "
It is one thing if you are doing this as student work, but in practice this is more of a job for a lexical analyzer and parser. In C, you would use lex
and yacc
or GNU flex
and bison
. In Java, you'd use ANTLR
or JavaCC
.
But start by writing a BNF grammar for your expected input (usually called the input language).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With