I have a string: <pre class="prettyprint"><code>'A Foo' </code></pre> and want to find "Foo" in it. I have a regular expression: <pre class="prettyprint"><code>/foo/ </code></pre> that I'm embedding into another case-insensitive regular expression, so I can build the pattern in steps: <pre class="prettyprint"><code>foo_regex = /foo/ pattern = /A #{ foo_regex }/i </code></pre> But it won't match correctly: <pre class="prettyprint"><code>'A Foo' =~ pattern # => nil </code></pre> If I embed the text directly into the pattern it works: <pre class="prettyprint"><code>'A Foo' =~ /A foo/i # => 0 </code></pre> What's wrong?

On the surface it seems that embedding a pattern inside another pattern would simply work, but that's based on a bad assumption of how patterns work in Ruby, that they're simply strings. Using: <pre class="prettyprint"><code>foo_regex = /foo/ </code></pre> creates a Regexp object: <pre class="prettyprint"><code>/foo/.class # => Regexp </code></pre> As such it has knowledge of the optional flags used to create it: <pre class="prettyprint"><code>( /foo/ ).options # => 0 ( /foo/i ).options # => 1 ( /foo/x ).options # => 2 ( /foo/ix ).options # => 3 ( /foo/m ).options # => 4 ( /foo/im ).options # => 5 ( /foo/mx ).options # => 6 ( /foo/imx ).options # => 7 </code></pre> or, if you like binary: <pre class="prettyprint"><code>'%04b' % ( /foo/ ).options # => "0000" '%04b' % ( /foo/i ).options # => "0001" '%04b' % ( /foo/x ).options # => "0010" '%04b' % ( /foo/xi ).options # => "0011" '%04b' % ( /foo/m ).options # => "0100" '%04b' % ( /foo/mi ).options # => "0101" '%04b' % ( /foo/mx ).options # => "0110" '%04b' % ( /foo/mxi ).options # => "0111" </code></pre> and remembers those whenever the Regexp is used, whether as a standalone pattern or if embedded in another. You can see this in action if we look to see what the pattern looks like after embedding: <pre class="prettyprint"><code>/#{ /foo/ }/ # => /(?-mix:foo)/ /#{ /foo/i }/ # => /(?i-mx:foo)/ </code></pre> <code>?-mix:</code> and <code>?i-mx:</code> are how those options are represented in an embedded-pattern. According to the Regexp documentation for Options: <blockquote> <code>i</code>, <code>m</code>, and <code>x</code> can also be applied on the subexpression level with the (?on-off) construct, which enables options on, and disables options off for the expression enclosed by the parentheses. </blockquote> So, Regexp is remembering those options, even inside the outer pattern, causing the overall pattern to fail the match: <pre class="prettyprint"><code>pattern = /A #{ foo_regex }/i # => /A (?-mix:foo)/i 'A Foo' =~ pattern # => nil </code></pre> It's possible to make sure that all sub-expressions match their surrounding patterns, however that can quickly become too convoluted or messy: <pre class="prettyprint"><code>foo_regex = /foo/i pattern = /A #{ foo_regex }/i # => /A (?i-mx:foo)/i 'A Foo' =~ pattern # => 0 </code></pre> Instead we have the <code>source</code> method which returns the text of a pattern: <pre class="prettyprint"><code>/#{ /foo/.source }/ # => /foo/ /#{ /foo/i.source }/ # => /foo/ </code></pre> The problem with the embedded pattern remembering the options also appears when using other Regexp methods, such as <code>union</code>: <pre class="prettyprint"><code>/#{ Regexp.union(%w[a b]) }/ # => /(?-mix:a|b)/ </code></pre> and again, <code>source</code> can help: <pre class="prettyprint"><code>/#{ Regexp.union(%w[a b]).source }/ # => /a|b/ </code></pre> Knowing all that: <pre class="prettyprint"><code>foo_regex = /foo/ pattern = /#{ foo_regex.source }/i # => /foo/i 'A Foo' =~ pattern # => 2 </code></pre>

How to embed regular expressions in other regular expressions in Ruby

Tags:

regex

ruby

I have a string:

'A Foo'

and want to find "Foo" in it.

I have a regular expression:

/foo/

that I'm embedding into another case-insensitive regular expression, so I can build the pattern in steps:

foo_regex = /foo/
pattern = /A #{ foo_regex }/i

But it won't match correctly:

'A Foo' =~ pattern # => nil

If I embed the text directly into the pattern it works:

'A Foo' =~ /A foo/i # => 0

What's wrong?

458

asked Mar 27 '17 22:03

the Tin Man

1 Answers

On the surface it seems that embedding a pattern inside another pattern would simply work, but that's based on a bad assumption of how patterns work in Ruby, that they're simply strings. Using:

foo_regex = /foo/

creates a Regexp object:

/foo/.class # => Regexp

As such it has knowledge of the optional flags used to create it:

( /foo/    ).options # => 0
( /foo/i   ).options # => 1
( /foo/x   ).options # => 2
( /foo/ix  ).options # => 3
( /foo/m   ).options # => 4
( /foo/im  ).options # => 5
( /foo/mx  ).options # => 6
( /foo/imx ).options # => 7

or, if you like binary:

'%04b' % ( /foo/    ).options # => "0000"
'%04b' % ( /foo/i   ).options # => "0001"
'%04b' % ( /foo/x   ).options # => "0010"
'%04b' % ( /foo/xi  ).options # => "0011"
'%04b' % ( /foo/m   ).options # => "0100"
'%04b' % ( /foo/mi  ).options # => "0101"
'%04b' % ( /foo/mx  ).options # => "0110"
'%04b' % ( /foo/mxi ).options # => "0111"

and remembers those whenever the Regexp is used, whether as a standalone pattern or if embedded in another.

You can see this in action if we look to see what the pattern looks like after embedding:

/#{ /foo/  }/ # => /(?-mix:foo)/
/#{ /foo/i }/ # => /(?i-mx:foo)/

?-mix: and ?i-mx: are how those options are represented in an embedded-pattern.

According to the Regexp documentation for Options:

i, m, and x can also be applied on the subexpression level with the (?on-off) construct, which enables options on, and disables options off for the expression enclosed by the parentheses.

So, Regexp is remembering those options, even inside the outer pattern, causing the overall pattern to fail the match:

pattern = /A #{ foo_regex }/i # => /A (?-mix:foo)/i
'A Foo' =~ pattern # => nil

It's possible to make sure that all sub-expressions match their surrounding patterns, however that can quickly become too convoluted or messy:

foo_regex = /foo/i
pattern = /A #{ foo_regex }/i # => /A (?i-mx:foo)/i
'A Foo' =~ pattern # => 0

Instead we have the source method which returns the text of a pattern:

/#{ /foo/.source  }/ # => /foo/
/#{ /foo/i.source }/ # => /foo/

The problem with the embedded pattern remembering the options also appears when using other Regexp methods, such as union:

/#{ Regexp.union(%w[a b]) }/ # => /(?-mix:a|b)/

and again, source can help:

/#{ Regexp.union(%w[a b]).source }/ # => /a|b/

Knowing all that:

foo_regex = /foo/
pattern = /#{ foo_regex.source }/i # => /foo/i
'A Foo' =~ pattern # => 2

164

answered Oct 06 '22 09:10

2 revs

Related questions
                            
                                How to do xml signing in ruby
                            
                                Test Speed: ActiveRecord use_transactional_fixtures vs. DatabaseCleaner.strategy = :transaction
                            
                                Heroku: PG::Error: ERROR: permission denied for relation
                            
                                Escape sequence for deleting next/trailing character?
                            
                                Rails acts_as_paranoid and has_many :through
                            
                                Ruby - :: in class name
                            
                                Can I override the system timezone in Ruby?
                            
                                ruby 2.0 named parameters from a hash
                            
                                Can I pass a block to a Proc?
                            
                                What is the purpose of `Kernel`?
                            
                                How to make Ruby test factories with random unique data, in Factory Girl or Minifacture?
                            
                                Rails: Could not find minitest-4.7.5 in any of the sources
                            
                                Can I Render A Layout Directly From routes.rb, Without A Controller?
                            
                                Expect that a JSON has just a particular set of keys in rspec?
                            
                                "bundle exec spring" not working with rbenv?
                            
                                Get name of subclass
                            
                                Rails how to tell if a sidekiq worker is done with perform_async
                            
                                How to sort an array of ints and strings? [duplicate]
                            
                                Implications of Having an object_id Column in Rails
                            
                                Why do I need to add ~/.rbenv/bin to my path?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With