Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can whitespace be trimmed from a Regex capture group?

The strings being examined resemble the following (notice the whitespace between the brackets):

[name]  [address ] [ zip] [ phone number ]

The expression I am presently using...

\[([^\])]*)\]

...successfully captures each text within the brackets, but it also grabs the leading and trailing space so I end up with:

"name"  "address "  " zip"  " phone number "

But what I seek is:

"name"  "address"  "zip"  "phone number"

How can the regex be convinced to not capture the whitespace in these examples? (With the exception of embedded whitespace - such as that between the words in "phone number".)

(Note: I know I could just trim it from the captured variable after the expression is done, but I'm trying to do it within the context of the expression.)

Thanks for any ideas! Below is the exact code I'm using to test this:

NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"\\[([^\\])]*)\\]" options:0 error:nil];

NSString *string = @" [name] [address ] [ zip] [ phone number ] ";

NSString *modifiedString = [regex stringByReplacingMatchesInString:string options:0 range:NSMakeRange(0, [string length])
    withTemplate:@"\n\n[$1]"]; //note: adding brackets back here just to make it easy to see if the space has been trimmed properly from the captured value

NSLog(@"\n\n%@", modifiedString);
like image 856
Monte Hurd Avatar asked Feb 16 '13 03:02

Monte Hurd


People also ask

How do I trim a whitespace in regex?

Trimming WhitespaceSearch for [ \t]+$ to trim trailing whitespace. Do both by combining the regular expressions into ^[ \t]+|[ \t]+$. Instead of [ \t] which matches a space or a tab, you can expand the character class into [ \t\r\n] if you also want to strip line breaks. Or you can use the shorthand \s instead.

How do I capture a space in regex?

If you're looking for a space, that would be " " (one space). If you're looking for one or more, it's " *" (that's two spaces and an asterisk) or " +" (one space and a plus).

What is white space in regex?

\s stands for “whitespace character”. Again, which characters this actually includes, depends on the regex flavor. In all flavors discussed in this tutorial, it includes [ \t\r\n\f]. That is: \s matches a space, a tab, a carriage return, a line feed, or a form feed.

Can a regex have space?

The most common forms of whitespace you will use with regular expressions are the space (␣), the tab (\t), the new line (\n) and the carriage return (\r) (useful in Windows environments), and these special characters match each of their respective whitespaces.


1 Answers

I'm going to go through this step by step.

First, the ([^\])]*) is incorrect. This means "a sequence of 0 or more characters, as long as possible, not containing ] or )."

For instance, for this expression:

 [name] [address ) ] [ zip] [ phone number ] 

...the address part will be skipped over, as "address )" does not match [^\)]]* (which means "a sequence of zero or more characters, not including ) and ]."

We want ([^\]]*) instead, which will not skip ).

Next, we want to eat all the spaces around the capture. For that, we use two  * sequences, one on each side of the capture:

\[ *([^\]]*) *\]

Now we need to get tricky! The [^\]]* is greedy by default. That means some of the spaces to either side may be matched by it, and thus included in the capture! We want to use the non greedy version, [^\]]*?, instead. This means "a sequence of 0 or more characters, not containing ], as short as possible while conforming to the rest of the regular expression."

\[ *([^\]]*?) *\]

like image 100
Steven Fisher Avatar answered Nov 06 '22 20:11

Steven Fisher