Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can i use string#split to split a string with the delimiters + - * / ( ) and space and retain them as an extra token?

I need to split strings containing basic mathematical expressions, such as:
"(a+b)*c"
or
" (a - c) / d"
The delimiters are + - * / ( ) and space and i need them as an independent token. Basically the result should look like this:

"("
"a"
"+"
"b"
")"
"*"
"c"

And for the second example:

" "
"("
"a"
" "
"-"
...

I read a lot of questions about similar problems with less complex delimiters and the common answer was to use zero space positive lookahead and -behind.
Like this: (?<=X | ?=X)
And X represents the delimiters, but putting them in a class like this:
[\\Q+-*()\\E/\\s]
does not work in the desired way.
So how do i have to format the delimiters to make the split work how i need it?

---Update---
Word class characters and longer combinations should not be splitted.
Such as "ab" "c1" or "12".
Or in short, I need the same result as the StringTokenizer would have, give the parameters "-+*/() " and true.

like image 469
Thiemo Krause Avatar asked Nov 02 '22 21:11

Thiemo Krause


2 Answers

Try splitting your data using

yourString.split("(?<=[\\Q+-*()\\E/\\s])|(?=[\\Q+-*()\\E/\\s])(?<!^)"));

I assume that problem you had was not in \\Q+-*()\\E part but in (?<=X | ?=X) <- it should be (?<=X)|(?=X) since it should produce look-behind and look-ahead.


demo for "_a+(ab-c1__)+12_" (BTW _ will be replaced with space in code. SO shows two spaces as one, so had to use __ to present them somehow)

String[] tokens = " a+(ab-c1  )+12 "
        .split("(?<=[\\Q+-*()\\E/\\s])|(?=[\\Q+-*()\\E/\\s])(?<!^)");
for (String token :  tokens)
    System.out.println("\"" + token + "\"");

result

" "
"a"
"+"
"("
"ab"
"-"
"c1"
" "
" "
")"
"+"
"12"
" "
like image 162
Pshemo Avatar answered Nov 14 '22 02:11

Pshemo


It is one thing if you are doing this as student work, but in practice this is more of a job for a lexical analyzer and parser. In C, you would use lex and yacc or GNU flex and bison. In Java, you'd use ANTLR or JavaCC.

But start by writing a BNF grammar for your expected input (usually called the input language).

like image 38
Eric Jablow Avatar answered Nov 14 '22 03:11

Eric Jablow