Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to check if a string matches a regexp in ruby?

What is the fastest way to check if a string matches a regular expression in Ruby?

My problem is that I have to "egrep" through a huge list of strings to find which are the ones that match a regexp that is given at runtime. I only care about whether the string matches the regexp, not where it matches, nor what the content of the matching groups is. I hope this assumption can be used to reduce the amount of time my code spend matching regexps.

I load the regexp with

pattern = Regexp.new(ptx).freeze 

I have found that string =~ pattern is slightly faster than string.match(pattern).

Are there other tricks or shortcuts that can used to make this test even faster?

like image 260
gioele Avatar asked Aug 09 '12 15:08

gioele


People also ask

How do you check if a regex matches a string?

Use the test() method to check if a regular expression matches an entire string, e.g. /^hello$/. test(str) . The caret ^ and dollar sign $ match the beginning and end of the string. The test method returns true if the regex matches the entire string, and false otherwise.

Is regex matching fast?

(but is slow in Java, Perl, PHP, Python, Ruby, ...)

How do you match a string in Ruby?

=~ is Ruby's basic pattern-matching operator. When one operand is a regular expression and the other is a string then the regular expression is used as a pattern to match against the string. (This operator is equivalently defined by Regexp and String so the order of String and Regexp do not matter.

How do I check if a regular expression is Ruby?

A regular expression is a sequence of characters that define a search pattern, mainly for use in pattern matching with strings. Ruby regular expressions i.e. Ruby regex for short, helps us to find particular patterns inside a string. Two uses of ruby regex are Validation and Parsing.


1 Answers

Starting with Ruby 2.4.0, you may use RegExp#match?:

pattern.match?(string) 

Regexp#match? is explicitly listed as a performance enhancement in the release notes for 2.4.0, as it avoids object allocations performed by other methods such as Regexp#match and =~:

Regexp#match?
Added Regexp#match?, which executes a regexp match without creating a back reference object and changing $~ to reduce object allocation.

like image 192
Wiktor Stribiżew Avatar answered Sep 28 '22 00:09

Wiktor Stribiżew