Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RegEx - greedy white space match

Tags:

regex

I am trying to determine the correct RegEx syntax to perform the following. I have line in a file in which I want to match every character before the first occurrence of white space.

so for example in the line:

123abc xyz foo bar

it is unclear to me why the following:

^.*\s

is matching up to the b in the word bar:

123abc xyz foo

It appears to me that the \s is greedy, however I am not certain how I can make it not greedy and just match 123abc I have tried various forms of this regex in an attempt to make it non-greedy ^.*\s? or something like this, however I have been unsuccessful. Thank you in advance

like image 476
vloche Avatar asked Jun 25 '12 19:06

vloche


People also ask

What is a greedy match in regex?

The standard quantifiers in regular expressions are greedy, meaning they match as much as they can, only giving back as necessary to match the remainder of the regex. By using a lazy quantifier, the expression tries the minimal match first.

What is the regex for whitespace?

\s stands for “whitespace character”. Again, which characters this actually includes, depends on the regex flavor. In all flavors discussed in this tutorial, it includes [ \t\r\n\f]. That is: \s matches a space, a tab, a carriage return, a line feed, or a form feed.

How do I get rid of white space in regex?

You can easily trim unnecessary whitespace from the start and the end of a string or the lines in a text file by doing a regex search-and-replace. Search for ^[ \t]+ and replace with nothing to delete leading whitespace (spaces and tabs). Search for [ \t]+$ to trim trailing whitespace.

How do I make a regex not greedy?

To make the quantifier non-greedy you simply follow it with a '?' the first 3 characters and then the following 'ab' is matched. greedy by appending a '?' symbol to them: *?, +?, ??, {n,m}?, and {n,}?.


1 Answers

That is because . can be any character, including space. You can try

^[^ ]*\s

or

^\S*\s

instead.

That is a greedy re. But you can make non-greedy re also:

^.*?\s

You mistake is that you have placed ? on a wrong place.

Examples:

$ echo aaaa bbb cccc dddd > re.txt
$ cat re.txt
aaaa bbb cccc dddd
$ egrep -o '^.*\s' re.txt
aaaa bbb cccc 
$ egrep -o '^\S*\s' re.txt
aaaa 
$ egrep -o '^[^ ]*\s' re.txt
aaaa 

And non-greedy search with perl:

$ perl -ne 'print "$1\n" if /^(.*?)\s/' re.txt
aaaa
like image 172
Igor Chubin Avatar answered Sep 22 '22 11:09

Igor Chubin