Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Composing regexp?

Tags:

regex

ruby

I would like to compose regexps, that is reusing a regex in a new regex.

Is that possible in Ruby ?

For instance, in order to simplify this assembly-like parsing :

LABELED_INSTR = /(\w+):(movi|addi)\s+(\w+),(\w+),(w+)/
NON_LABELED_INSTR = /(movi|addi)\s+(\w+),(\w+),(w+)/

I would like to resort to :

IMMEDIATE = /(movi|addi)/

But then I don't know how to share this regex in the two previous ones.

Any hint ?

like image 206
JCLL Avatar asked Dec 20 '11 09:12

JCLL


People also ask

How do you make a regexp?

If you want to match for the actual '+', '. ' etc characters, add a backslash( \ ) before that character. This will tell the computer to treat the following character as a search character and consider it for matching pattern. Example : \d+[\+-x\*]\d+ will match patterns like "2+2" and "3*9" in "(2+2) * 3*9".

What does '$' mean in regex?

$ means "Match the end of the string" (the position after the last character in the string).

What does regex 0 * 1 * 0 * 1 * Mean?

Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1.

What is regexp used for?

A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation.


1 Answers

Sure, regular expressions can be reused (or composed) within other regexes. Here's an example that combines two regexes to make a third:

>> a = /boo/
=> boo
>> b = /foo/
=> foo
>> c = /#{a}|#{b}/
=> -mix:boo-mix:foo
>> if "boo" =~ c
>>   puts "match!"
>>   end
match!
=> nil

Your example is pretty similar. Here, it would be:

IMMEDIATE = /(movi|addi)/
LABELED_INSTR = /(\w+):#{IMMEDIATE}\s+(\w+),(\w+),(w+)/
NON_LABELED_INSTR = /#{IMMEDIATE}\s+(\w+),(\w+),(w+)/
like image 157
Chris Bunch Avatar answered Oct 02 '22 01:10

Chris Bunch