Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java regex split string by comma but ignore quotes and also parentheses [duplicate]

Tags:

java

string

regex

I'm stuck with this regex.

So, I have input as:

  • "Crane device, (physical object)"(X1,x2,x4), not "Seen by research nurse (finding)", EntirePatellaBodyStructure(X1,X8), "Besnoitia wallacei (organism)", "Catatropis (organism)"(X1,x2,x4), not IntracerebralRouteQualifierValue, "Diospyros virginiana (organism)"(X1,x2,x4), not SuturingOfHandProcedure(X1)

and in the end I would like to get is:

  • "Crane device, (physical object)"(X1,x2,x4)
  • not "Seen by research nurse (finding)"
  • EntirePatellaBodyStructure(X1,X8)
  • "Besnoitia wallacei (organism)"
  • "Catatropis (organism)"(X1,x2,x4)
  • not IntracerebralRouteQualifierValue
  • "Diospyros virginiana (organism)"(X1,x2,x4)
  • not SuturingOfHandProcedure(X1)

I've tried regex

(\'[^\']*\')|(\"[^\"]*\")|([^,]+)|\\s*,\\s*

It works if I don't have a comma inside parentheses.

like image 933
Vadim Ivanov Avatar asked Oct 04 '22 01:10

Vadim Ivanov


2 Answers

RegEx

(\w+\s)?("[^"]+"|\w+)(\(\w\d(,\w\d)*\))?

Java Code

String input = ... ;
Matcher m = Pattern.compile(
          "(\\w+\\s)?(\"[^\"]+\"|\\w+)(\\(\\w\\d(,\\w\\d)*\\))?").matcher(input);
while(matcher.find()) {
    System.out.println(matcher.group());
}

Output

"Crane device, (physical object)"(X1,x2,x4)
not "Seen by research nurse (finding)"
EntirePatellaBodyStructure(X1,X8)
not "Besnoitia wallacei (organism)"(X1,x2,x4)
not "Catatropis (organism)"(X1,x2,x4)
not IntracerebralRouteQualifierValue
not "Diospyros virginiana (organism)"(X1,x2,x4)
not SuturingOfHandProcedure(X1)
like image 197
Ravi K Thapliyal Avatar answered Oct 13 '22 10:10

Ravi K Thapliyal


Don't use regexes for this. Write a simple parser that keeps track of the number of parentheses encountered, and whether or not you are inside quotes. For more information, see: RegEx match open tags except XHTML self-contained tags

like image 42
We Are All Monica Avatar answered Oct 13 '22 09:10

We Are All Monica