Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does 'String#gsub { }' (with a block) work?

Tags:

string

ruby

When I do,

> "fooo".gsub("o") {puts "Found an 'o'"}
Found an 'o'
Found an 'o'
Found an 'o'
=> "f"

gsub removes all 'o's. How does this work?

I think gsub passes each character to the block, but since block is doing nothing to the character itself (like catching it), it is dropped.

I think this is the case because, when I do

> "fooo".gsub("o"){|ch| ch.upcase}
=> "fOOO"

the block is catching the character and turning it into uppercase. But when I do,

> "fooo".gsub("o", "u"){|ch| ch.upcase}
=> "fuuu"

How does Ruby handle the block in this case?

I found that Ruby plugs the blocks into methods using yield. (check this) But I am still not sure about my explanation for the first code example and third example. Can anyone put some more light on this?

like image 757
vadasambar Avatar asked Dec 18 '22 04:12

vadasambar


1 Answers

The documentation of method String#gsub explains how it works, depending of what parameters it gets:

gsub(pattern, replacement)new_str
gsub(pattern, hash)new_str
gsub(pattern) {|match| block }new_str
gsub(pattern)enumerator

Returns a copy of str with all occurrences of pattern substituted for the second argument. The pattern is typically a Regexp; if given as a String, any regular expression metacharacters it contains will be interpreted literally, e.g. \\d will match a backslash followed by d, instead of a digit.

If replacement is a String it will be substituted for the matched text. It may contain back-references to the pattern’s capture groups of the form \\d, where d is a group number, or \\k<n>, where n is a group name. If it is a double-quoted string, both back-references must be preceded by an additional backslash. However, within replacement the special match variables, such as $&, will not refer to the current match.

If the second argument is a Hash, and the matched text is one of its keys, the corresponding value is the replacement string.

In the block form, the current match string is passed in as a parameter, and variables such as $1, $2, $`, $&, and $' will be set appropriately. The value returned by the block will be substituted for the match on each call.

The result inherits any tainting in the original string or any supplied replacement string.

When neither a block nor a second argument is supplied, an Enumerator is returned.

The answer to your question looks straightforward now. When only one argument is passed (the pattern), "the value returned by the block will be substituted for the match on each call".

Two arguments and a block is a case not covered by the documentation because it is not a valid combination. It seems that when two arguments are passed, String#gsub doesn't expect a block and ignores it.

Update

The purpose of String#gsub is to do a "global search", i.e. find all occurrences of some string or pattern and replace them.

The first argument, pattern, is the string or pattern to search for. There is nothing special about it. It can be a string or a regular expression. String#gsub searches it and finds zero or more matches (occurrences).

With only one argument and no block, String#gsub returns an iterator because it can find the pattern but it doesn't have a replacement string to use.

There are three ways to provide it the replacements for the matches (the first three cases described in the documentation quoted above):

  1. a String is used to replace all the matches; it is usually used to remove parts from a string (by providing the empty string as replacement) or mask fragments of it (credit card numbers, passwords, email addresses etc);
  2. a Hash is used to provide different replacements for each match; it is useful when the matches are known in advance;
  3. a block is provided when the replacements depend on the matched substrings but the matches are not known in advance; for example, a block can convert each matching substring to uppercase and return it to let String#gsub use it as replacement.
like image 150
axiac Avatar answered Jan 05 '23 09:01

axiac