Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular Expression for separating strings enclosed in parentheses [duplicate]

Tags:

java

string

regex

I have a String that contains 2 or 3 company names each enclosed in parentheses. Each company name can also contains words in parentheses. I need to separate them using regular expressions but didn't find how.

My inputStr:

(Motor (Sport) (racing) Ltd.) (Motorsport racing (Ltd.)) (Motorsport racing Ltd.)
or 
(Motor (Sport) (racing) Ltd.) (Motorsport racing (Ltd.))

The expected result is:

str1 = Motor (Sport) (racing) Ltd.
str2 = Motorsport racing (Ltd.)
str3 = Motorsport racing Ltd.

My code:

String str1, str2, str3;
Pattern p = Pattern.compile("\\((.*?)\\)");
Matcher m = p.matcher(inputStr);
int index = 0;
while(m.find()) {

    String text = m.group(1);
    text = text != null && StringUtils.countMatches(text, "(") != StringUtils.countMatches(text, ")") ? text + ")" : text;

    if (index == 0) {
        str1= text;
    } else if (index == 1) {
        str2 = text;
    } else if (index == 2) {
        str3 = text;
    }

    index++;
}

This works great for str2 and str3 but not for str1.

Current result:

str1 = Motor (Sport)
str2 = Motorsport racing (Ltd.)
str3 = Motorsport racing Ltd.
like image 902
Eqr444 Avatar asked May 08 '18 09:05

Eqr444


3 Answers

You can solve this problem without regex; refer to this question about how to find the outermost parentheses.

Here is an example:

import java.util.Stack;

public class Main {

    public static void main(String[] args) {
        String input = "(Motor (Sport) (racing) Ltd.) (Motorsport racing (Ltd.)) (Motorsport racing Ltd.)";
        for (int index = 0; index < input.length(); ) {
            if (input.charAt(index) == '(') {
                int close = findClose(input, index);  // find the  close parentheses
                System.out.println(input.substring(index + 1, close));
                index = close + 1;  // skip content and nested parentheses
            } else {
                index++;
            }
        }
    }
    private static int findClose(String input, int start) {
        Stack<Integer> stack = new Stack<>();
        for (int index = start; index < input.length(); index++) {
            if (input.charAt(index) == '(') {
                stack.push(index);
            } else if (input.charAt(index) == ')') {
                stack.pop();
                if (stack.isEmpty()) {
                    return index;
                }
            }
        }
        // unreachable if your parentheses is balanced
        return 0;
    }

}

Output:

Motor (Sport) (racing) Ltd.
Motorsport racing (Ltd.)
Motorsport racing Ltd.
like image 164
xingbin Avatar answered Nov 12 '22 17:11

xingbin


So we can assume that the parentheses can nest at most two levels deep. So we can do it without too much magic. I would go with this code:

List<String> matches = new ArrayList<>();
Pattern p = Pattern.compile("\\([^()]*(?:\\([^()]*\\)[^()]*)*\\)");
Matcher m = p.matcher(inputStr);
while (m.find()) {
    String fullMatch = m.group();
    matches.add(fullMatch.substring(1, fullMatch.length() - 1));
}

Explanation:

  • First we match a parenthesis: \\(
  • Then we match some non-parenthesis characters: [^()]*
  • Then zero or more times: (?:...)* we will see some stuff within parentheses, and then some non-parentheses again:
  • \\([^()]*\\)[^()]* - it's important that we don't allow any more parentheses within the inside parentheses
  • And then the closing parenthesis comes: \\)
  • m.group(); returns the actual full match.
  • fullMatch.substring(1, fullMatch.length() - 1) removes the parentheses from the start and the end. You could do it with another group too. I just didn't want to make the regex uglier.
like image 20
Tamas Rev Avatar answered Nov 12 '22 16:11

Tamas Rev


Why not just solve it using a stack? It will have O(n) complexity only

  1. Just parse the string and everytime you come across a '(', push it to the stack and everytime you come across a ')' , pop from the stack. else, put the character in a buffer.
  2. If the stack is empty while pushing a '(' then that means it is in a company name so also put that in the buffer.
  3. Similarly, if the stack isn't empty after popping, then put the ')' in the buffer as it is part of the company name.
  4. If the stack is empty after popping, that means that the first company name has ended and the buffer value is the name of the company and clear the buffer.

    String string = "(Motor (Sport) (racing) Ltd.) (Motorsport racing (Ltd.)) (Motorsport racing Ltd.)";
    List<String> result = new ArrayList();
    StringBuffer buffer = new StringBuffer();
    
    Stack<Character> stack = new Stack<Character>();
    for (int j = 0; j < string.length(); j++) {
        if (string.charAt(j) == '(') {
            if (!stack.empty())
                buffer.append('(');
            stack.push('(');
        } else if (string.charAt(j) == ')') {
            stack.pop();
            if (stack.empty()) {
                result.add(buffer.toString());
                buffer = new StringBuffer();
            }else
                buffer.append(')');
        }else{
            buffer.append(string.charAt(j));
        }
    }
    
    for(int i=0;i<result.size();i++){
        System.out.println(result.get(i));
    }
    
like image 24
Napstablook Avatar answered Nov 12 '22 16:11

Napstablook