Are Ruby 1.9 regular expressions equally powerful to a context free grammar?

Tags:

I have this regular expression:

regex = %r{\A(?<foo> a\g<foo>a | b\g<foo>b | c)\Z}x

When I test it against several strings, it appears to be as powerful as a context free grammar because it handles the recursion properly.

regex.match("aaacaaa")
# => #<MatchData "aaacaaa" foo:"aaacaaa">
regex.match("aacaa")
# => #<MatchData "aacaa" foo:"aacaa">
regex.match("aabcbaa")
# => #<MatchData "aabcbaa" foo:"aabcbaa">
regex.match("aaacaa")
# => nil

"Fun with Ruby 1.9 Regular Expressions" has an example where he actually arranges all the parts of a regex so that it looks like a context-free grammar as follows:

sentence = %r{ 
    (?<subject>   cat   | dog   | gerbil    ){0} 
    (?<verb>      eats  | drinks| generates ){0} 
    (?<object>    water | bones | PDFs      ){0} 
    (?<adjective> big   | small | smelly    ){0} 

    (?<opt_adj>   (\g<adjective>\s)?     ){0} 

    The\s\g<opt_adj>\g<subject>\s\g<verb>\s\g<opt_adj>\g<object> 
}x

Between his technique for rearranging the parts of the regex, and my example of recursive named capturing groups, does this mean Ruby 1.9 regular expressions have the power equivalent to a context-free grammar?

628

asked Jan 22 '12 05:01

Ken Bloom

1 Answers

This is one of the awesome things about the Oniguruma regexp engine used in Ruby 1.9 – it has the power of a parser, and is not restricted to recognizing regular languages. It has positive and negative lookahead/lookbehind, which even can be used to recognize some languages which are not context-free! Take the following as an example:

regexp = /\A(?<AB>a\g<AB>b|){0}(?=\g<AB>c)a*(?<BC>b\g<BC>c|){1}\Z/

This regexp recognizes strings like “abc”, “aabbcc”, “aaabbbccc”, and so on – the number of “a”, “b”, and “c” must be equal, or it will not match.

(One limitation: you can’t use named groups in the lookahead and lookbehind.)

Although I haven’t peeked under the hood, Oniguruma seems to deal with named groups by simple recursive descent, backing up when something doesn’t match. I’ve observed that it can’t deal with left recursion. For example:

irb(main):013:0> regexp = /(?<A>\g<A>a|)/
SyntaxError: (irb):13: never ending recursion: /(?<A>\g<A>a|)/
    from C:/Ruby192/bin/irb:12:in `<main>'

I don’t remember my parsing theory very clearly, but I think that a non-deterministic top-down parser like this should be able to parse any context-free language. (“language”, not “grammar”; if your grammar has left recursion, you will have to convert it to right recursion.) If that is incorrect, please edit this post.

146

answered Sep 21 '22 12:09

Alex D

Related questions
                            
                                Taps server error `<module:Templates>': uninitialized constant Tilt::CompileSite (NameError)
                            
                                Which library would provide the easiest OAuth 2.0 authentication for Sinatra
                            
                                Rails Merge child errors with Parent errors
                            
                                could not connect to server: "/var/run/postgresql/.s.PGSQL.5432"?
                            
                                Why can't protected methods be called with symbol to proc?
                            
                                ActionCable channel not accessible outside rails plugin
                            
                                Unexpected value of __callee__ when including a module – is this a Ruby bug?
                            
                                Set Rack session cookie expiration programmatically
                            
                                Ruby gem for consuming Webservices wsdl with xsd:import
                            
                                Ruby 1.9.3 compatible memory profiling tools with Rails
                            
                                Running Heroku Cedar Locally using Vagrant
                            
                                Are there any example Rails apps where business logic is kept in separate classes than ActiveRecord?
                            
                                Is there a way to access and retrieve iCloud reminders with Ruby application?
                            
                                How to test a function which takes a block with rspec
                            
                                How do I do Ruby on Rails concurrency testing?
                            
                                Is there a good openCL wrapper for Ruby?
                            
                                When would one use the replace method of a string?
                            
                                How to fix PG::DuplicatePstatement: ERROR?
                            
                                Why does the Ruby debugger return different values than the code at run time?
                            
                                Correct way to safely store token/secret/etc from OAuth?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Are Ruby 1.9 regular expressions equally powerful to a context free grammar?

Tags:

regex

ruby

theory

context-free-grammar

oniguruma

Ken Bloom

People also ask

1 Answers

Alex D

Recent Activity

Donate For Us