Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How the Anchor \z and \G works in Ruby?

I am using Ruby1.9.3. I am newbie to this platform.

From the doc I just got familiared with two anchor which are \z and \G. Now I little bit played with \z to see how it works, as the definition(End or End of String) made me confused, I can't understand what it meant say - by End. So I tried the below small snippets. But still unable to catch.

CODE

irb(main):011:0> str = "Hit him on the head me 2\n" + "Hit him on the head wit>
=> "Hit him on the head me 2\nHit him on the head with a 24\n"
irb(main):012:0> str =~ /\d\z/
=> nil

irb(main):013:0> str = "Hit him on the head me 24 2\n" + "Hit him on the head >
=> "Hit him on the head me 24 2\nHit him on the head with a 24\n"
irb(main):014:0> str =~ /\d\z/
=> nil

irb(main):018:0> str = "Hit1 him on the head me 24 2\n" + "Hit him on the head>
=> "Hit1 him on the head me 24 2\nHit him on the head with a11 11 24\n"
irb(main):019:0> str =~ /\d\z/
=> nil
irb(main):020:0>

Every time I got nil as the output. So how the calculation is going on for \z ? what does End mean? - I think my concept took anything wrong with the End word in the doc. So anyone could help me out to understand the reason what is happening with the out why so happening?

And also i didn't find any example for the anchor \G . Any example please from you people to make visualize how \G used in real time programming?

EDIT

irb(main):029:0>
irb(main):030:0*  ("{123}{45}{6789}").scan(/\G(?!^)\{\d+\}/)
=> []
irb(main):031:0>  ('{123}{45}{6789}').scan(/\G(?!^)\{\d+\}/)
=> []
irb(main):032:0>

Thanks

like image 300
Arup Rakshit Avatar asked Jan 14 '23 11:01

Arup Rakshit


2 Answers

\z matches the end of the input. You are trying to find a match where 4 occurs at the end of the input. Problem is, there is a newline at the end of the input, so you don't find a match. \Z matches either the end of the input or a newline at the end of the input.

So:

/\d\z/

matches the "4" in:

"24"

and:

/\d\Z/

matches the "4" in the above example and the "4" in:

"24\n"

Check out this question for example of using \G:
Examples of regex matcher \G (The end of the previous match) in Java would be nice


UPDATE: Real-World uses for \G

I came up with a more real world example. Say you have a list of words that are separated by arbitrary characters that cannot be well predicted (or there's too many possibilities to list). You'd like to match these words where each word is its own match up until a particular word, after which you don't want to match any more words. For example:

foo,bar.baz:buz'fuzz*hoo-har/haz|fil^bil!bak

You want to match each word until 'har'. You don't want to match 'har' or any of the words that follow. You can do this relatively easily using the following pattern:

/(?<=^|\G\W)\w+\b(?<!har)/

rubular

The first attempt will match the beginning of the input followed by zero non-word character followed by 3 word characters ('foo') followed by a word boundary. Finally, a negative lookbehind assures that the word which has just been matched is not 'har'.

On the second attempt, matching picks back up at the end of the last match. 1 non-word character is matched (',' - though it is not captured due to the lookbehind, which is a zero-width assertion), followed by 3 characters ('bar').

This continues until 'har' is matched, at which point the negative lookbehind is triggered and the match fails. Because all matches are supposed to be "attached" to the last successful match, no additional words will be matched.

The result is:

foo
bar
baz
buz
fuzz
hoo

If you want to reverse it and have all words after 'har' (but, again, not including 'har'), you can use an expression like this:

/(?!^)(?<=har\W|\G\W)\w+\b/

rubular

This will match either a word which is immediately preceeded by 'har' or the end of the last match (except we have to make sure not to match the beginning of the input). The list of matches is:

haz
fil
bil
bak

If you do want to match 'har' and all following words, you could use this:

/\bhar\b|(?!^)(?<=\G\W)\w+\b/

rubular

This produces the following matches:

har
haz
fil
bil
bak
like image 140
JDB Avatar answered Jan 20 '23 05:01

JDB


Sounds like you want to know how Regex works? Or do you want to know how Regex works with ruby?

Check these out.

Regexp Class description

The Regex Coach - Great for testing regex matching

Regex cheat sheet

I understand \G to be a boundary match character. So it would tell the next match to start at the end of the last match. Perhaps since you haven't made a match yet you cant have a second.

Here is the best example I can find. Its not in ruby but the concept should be the same.

I take it back this might be more useful

like image 29
Zach Avatar answered Jan 20 '23 04:01

Zach