Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby regex what does the \1 mean for gsub

Tags:

regex

ruby

gsub

What does the \1 do?

For example

"foo bar bag".gsub(/(bar)/,'car\1')

I believe it has something to do with how you use parentheses, but I'm not really sure. Could someone explain it to me? And can you do stuff like \2? If so, what would that do?

like image 266
Tommy Avatar asked Apr 05 '13 04:04

Tommy


People also ask

What does GSUB stand for in Ruby?

gsub! is a String class method in Ruby which is used to return a copy of the given string with all occurrences of pattern substituted for the second argument. If no substitutions were performed, then it will return nil. If no block and no replacement is given, an enumerator is returned instead.

What is GSUB in regex?

Regular expressions (shortened to regex) are used to operate on patterns found in strings. They can find, replace, or remove certain parts of strings depending on what you tell them to do. In Ruby, they are always contained within two forward slashes.

What does GSUB return?

gsub (s, pattern, repl [, n]) Returns a copy of s in which all (or the first n , if given) occurrences of the pattern have been replaced by a replacement string specified by repl , which can be a string, a table, or a function. gsub also returns, as its second value, the total number of matches that occurred.

What does =~ mean in Ruby?

=~ is Ruby's pattern-matching operator. It matches a regular expression on the left to a string on the right. If a match is found, the index of first match in string is returned. If the string cannot be found, nil will be returned.


2 Answers

Each item that you surround with parenthesis in the searching part will correspond to a number \1, \2, etc., in the substitution part.

In your example, there's only one item surrounded by parenthesis, the "(bar)" item, so anywhere you put a \1 is where the part inside the parenthesis, will be swapped in. You can put in the \1 multiple times, which is handy if you want to repeat that found item, so you could legitimately write car\1\1\1 and "bar" will be swapped in three times.

There's no use for \2 because there's only one item surrounded by parentheses. However, if you had (bar)(jar), then the \1 would represent "bar" and \2 would represent "jar".

You could even do things like this:

\1\2\1\2\2\1

which would become:

barjarbarjarjarbar

Here's a real-world example where this comes in handy. Let's say you have a name list like this:

Jones, Tom  
Smith, Alan  
Smith, Dave  
Wilson, Bud

and you want to change it to this:

Tom Jones  
Alan Smith  
Dave Smith  
Bud Wilson

You could search for:

(.+), (.+)

and replace with:

\2 \1

You could also replace with:

\1: \2 \1  

Which would become:

Jones: Tom Jones  
Smith: Alan Smith  
Smith: Dave Smith  
Wilson: Bud Wilson
like image 56
James Toomey Avatar answered Oct 04 '22 01:10

James Toomey


Generally speaking \N is replaced with the N-th group specified in the regular expression. The first matched group is referenced by \1 and the maximum number of groups is 9.

Some examples:

# wrap every integer into brackets
'1 2 34'.gsub(/(\d+)/, '[\1]')
# => "[1] [2] [34]"

# gsub with two groups: swap couples of integers
'<1,2> <3,4>'.gsub(/(\d+),(\d+)/, '\2,\1')
# => "<2,1> <4,3>" 

# you can reference the same group more than once
'1 2 34'.gsub(/(\d+)/, '<\1,\1>')
#  => "<1,1> <2,2> <34,34>"

# a slightly more complex example
'Jim Morrison'.sub(/([A-Z])[a-z]+ ([A-Z][a-z]+)/, '\2 \1.')
# => "Morrison J."
like image 26
toro2k Avatar answered Oct 04 '22 00:10

toro2k