Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex for splitting a string delimited by | when not enclosed on double quotes

Tags:

java

regex

I need a regex to count the number of columns in a pipe delimited string in java. The column data will always be enclosed by double quotes or it will be empty.

eg:

"1234"|"Name"||"Some description with ||| in it"|"Last Column"

The above should be counted as 5 columns including one empty column after "Name" column.

Thanks

like image 320
Nash Avatar asked Jun 11 '12 08:06

Nash


2 Answers

Here's one way to do it:

String input =
    "\"1234\"|\"Name\"||\"Some description with ||| in it\"|\"Last Column\"";
//  \_______/ \______/\/\_________________________________/ \_____________/    
//      1        2    3                 4                          5

int cols = input.replaceAll("\"[^\"]*\"", "")  // remove "..."
                .replaceAll("[^|]", "")        // remove anything else than |
                .length() + 1;                 // Count the remaining |, add 1

System.out.println(cols);   // 5

IMO it's not very robust though. I wouldn't recommend using regular expressions if you plan on handling escaped quotes, for instance.

like image 179
aioobe Avatar answered Oct 29 '22 21:10

aioobe


Slightly improved the expressions in aioobe's answer:

int cols = input.replaceAll("\"(?:[^\"\\]+|\\.)*\"|[^|]+", "")
                .length() + 1;

Handles escapes in quotes, and uses a single expression to remove everything except the delimiters.

like image 42
Qtax Avatar answered Oct 29 '22 21:10

Qtax