Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does \b really mean in Ruby regular expressions?

Tags:

regex

ruby

I have a file with phrases such as "Canyon St / 27th Way" that I am trying to turn into "Canyon St and 27th Way" with Ruby regular expressions.

I used file = file.gsub(/(\b) \/ (\b)/, "#{$1} and #{$2}") to make the match, but I am a little stumped about what \b really means and why $1 contains all of the characters before the word boundary that precedes the slash and why $2 contains all of the characters after the word boundary starting the next word.

Usually, I expect that whatever is in parentheses in a regular expression would be in $1 and $2, but I am not sure what parentheses around a word boundary would really mean because there really is nothing between the transition from a word character to a white space character.

like image 557
S. Miller Avatar asked May 15 '15 19:05

S. Miller


People also ask

What does \b mean in regex?

The \b metacharacter matches at the beginning or end of a word.

What is the difference between \b and \b in regular expression?

Using regex \B-\B matches - between the word color - coded . Using \b-\b on the other hand matches the - in nine-digit and pass-key .

What is \b word boundary?

Word Boundary: \b The word boundary \b matches positions where one side is a word character (usually a letter, digit or underscore—but see below for variations across engines) and the other side is not a word character (for instance, it may be the beginning of the string or a space character).

What does \b mean in regex python?

Matches only at the start of the string. \b. Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of word characters. Note that formally, \b is defined as the boundary between a \w and a \W character (or vice versa), or between \w and the beginning/end of the string.


2 Answers

The parentheses aren't doing anything in this context. You could get the same result using /\b \/ \b/.

I think you are getting a little confused by $1 and $2. Those aren't actually doing anything either. They are nil because they are matching nothing (just a word boundry). What you have written is the logical equivalent of .gsub(/\b \/ \b/, " and ")

like image 68
Rob Wagner Avatar answered Sep 28 '22 02:09

Rob Wagner


The $1 and $2 are not actually related to your regex match: a method's arguments are evaluated before the method is called, so

"#{$1} and #{$2}"

Is evaluated before the regex is matched against your string. If you haven't done earlier regex matches then these variables will be nil, so you're actually doing

file = file.gsub(/(\b) \/ (\b)/, " and ")

that is you are replacing a slash surrounded by spaces by "and", also surrounded by spaces. $1 and $2 will be updated to be empty strings, and so you'll see the same behaviour when you process the next string.

like image 34
Frederick Cheung Avatar answered Sep 28 '22 02:09

Frederick Cheung