Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression to select all whitespace that isn't in quotes?

I'm not very good at RegEx, can someone give me a regex (to use in Java) that will select all whitespace that isn't between two quotes? I am trying to remove all such whitespace from a string, so any solution to do so will work.

For example:

(this is a test "sentence for the regex")

should become

(thisisatest"sentence for the regex")

like image 290
Sam Stern Avatar asked Mar 06 '12 04:03

Sam Stern


People also ask

Which regex would you use to remove all whitespace from string?

Using Regular Expression The best way to find all whitespaces and replace them with an empty string is using regular expressions. A white space is denoted with “\\s” in regex. All we have to find all such occurrences and replace them with an empty string. Use "\\s+" if there are more than one consecutive whitespaces.

How do I get rid of white space in regex?

The replaceAll() method accepts a string and a regular expression replaces the matched characters with the given string. To remove all the white spaces from an input string, invoke the replaceAll() method on it bypassing the above mentioned regular expression and an empty string as inputs.

Do I need to escape quotes in regex?

In order to use a literal ^ at the start or a literal $ at the end of a regex, the character must be escaped. Some flavors only use ^ and $ as metacharacters when they are at the start or end of the regex respectively. In those flavors, no additional escaping is necessary. It's usually just best to escape them anyway.

Does whitespace matter in regex?

Match Whitespace Characters in Python? Yes, the dot regex matches whitespace characters when using Python's re module.


1 Answers

Here's a single regex-replace that works:

\s+(?=([^"]*"[^"]*")*[^"]*$) 

which will replace:

(this is a test "sentence for the regex" foo bar) 

with:

(thisisatest"sentence for the regex"foobar) 

Note that if the quotes can be escaped, the even more verbose regex will do the trick:

\s+(?=((\\[\\"]|[^\\"])*"(\\[\\"]|[^\\"])*")*(\\[\\"]|[^\\"])*$) 

which replaces the input:

(this is a test "sentence \"for the regex" foo bar) 

with:

(thisisatest"sentence \"for the regex"foobar) 

(note that it also works with escaped backspaces: (thisisatest"sentence \\\"for the regex"foobar))

Needless to say (?), this really shouldn't be used to perform such a task: it makes ones eyes bleed, and it performs its task in quadratic time, while a simple linear solution exists.

EDIT

A quick demo:

String text = "(this is a test \"sentence \\\"for the regex\" foo bar)"; String regex = "\\s+(?=((\\\\[\\\\\"]|[^\\\\\"])*\"(\\\\[\\\\\"]|[^\\\\\"])*\")*(\\\\[\\\\\"]|[^\\\\\"])*$)"; System.out.println(text.replaceAll(regex, ""));  // output: (thisisatest"sentence \"for the regex"foobar) 
like image 55
Bart Kiers Avatar answered Sep 19 '22 13:09

Bart Kiers