Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split string by all spaces except those in brackets [duplicate]

Tags:

java

regex

Possible Duplicate:
Split a String based on regex

I've never been a regular expression guru, so I need your help! I have a string like this:

String s = "a [b c] d [e f g]";

I want to split this string using spaces as delimiters -- but I don't want to split on spaces that appear within the [] brackets. So, from the example above, I would like this array:

{"a", "[b c]", "d", "[e f g]"}

Any advice on what regex could be used in conjunction with split in order to achieve this?


Here's another example:

"[a b] c [[d e] f g]"

becomes

{"[a b]", "c", "[[d e] f g]"}
like image 885
arshajii Avatar asked Oct 14 '12 17:10

arshajii


People also ask

How do I split a string into multiple spaces?

To split a string by multiple spaces, call the split() method, passing it a regular expression, e.g. str. trim(). split(/\s+/) . The regular expression will split the string on one or more spaces and return an array containing the substrings.

What does split \\ s+ do in Java?

split("\\s+") will split the string into string of array with separator as space or multiple spaces. \s+ is a regular expression for one or more spaces.

How do you split a string with double quotes?

Use method String. split() It returns an array of String, splitted by the character you specified.


3 Answers

I think this should work, using negative lookahead - it matches no whitespace that comes before closing bracket without an opening bracket:

"a [b c] d [e f g]".split("\\s+(?![^\\[]*\\])");

For nested brackets you will need to write a parser, regexes can't afford an infinite level and get too complicated for more than one or two levels. My expression for example fails for

"[a b [c d] e] f g"
like image 92
Bergi Avatar answered Sep 20 '22 06:09

Bergi


You can not do that with single regex, simply because it can not match open/close braces and handle nested braces.

Regexes are not turing-complete, so even if it might look as working, there will be the case where it fails to.

So I'd rather suggest to program your own few lines of code which will definitely handle all cases.

You may create very simple grammar for JavaCC or AntLR or use simple stack-based parser.

like image 39
jdevelop Avatar answered Sep 22 '22 06:09

jdevelop


As said in other answers you need a parser for that. Here a string that fail with previous regex solutions.

"[a b] c [a [d e] f g]"

EDIT:

public static List<String> split(String s){
    List<String> l = new LinkedList<String>();
    int depth=0;
    StringBuilder sb = new StringBuilder();
    for(int i=0; i<s.length(); i++){
        char c = s.charAt(i);
        if(c=='['){
            depth++;
        }else if(c==']'){
            depth--;
        }else if(c==' ' && depth==0){
            l.add(sb.toString());
            sb = new StringBuilder();
            continue;
        }
        sb.append(c);
    }
    l.add(sb.toString());

    return l;
}
like image 43
Marco Martinelli Avatar answered Sep 21 '22 06:09

Marco Martinelli