I want to capture several text using the following regex: <pre class="prettyprint"><code>$text_normal = qr{^(\/F\d+) FF (.*?) SCF SF (.*?) MV (\(.*?)SH$}; </code></pre> A sample of the string is like below: <pre class="prettyprint"><code>my $text = '/F12345 FF FF this is SCF SF really MV (important stuff SH'; </code></pre> Can that be rewritten to speed up the matching?

Without seeing some sample data it is hard to say. Generally, it is a good idea to avoid using <code>.*</code>. Look for any possible sources of unneeded backtracking, and eliminate them. You might be able to get away with a <code>split</code> with a slice if your needs are simple. <pre class="prettyprint"><code> my @vals = (split / /, $string)[0,2,5,7]; </code></pre>

This very much depends on the profile of the data you are scanning. You identify the piece of your regular expression which filters out the most input and do a separate simpler regular expression for that expression. For instance, if only 5% of your input date contained the <code>'MV'</code> string, you could filter for this first and only apply the full more complex regular expression if the simpler one is true. So you would have: <pre class="prettyprint"><code>if ( $text_normal =~ / MV / ) { $text_normal = qr{^(\/F\d+) FF (.*?) SCF SF (.*?) MV (\(.*?)SH$}; if ....... } } </code></pre>

How can I speed up my Perl regex matching?

Tags:

regex

perl

I want to capture several text using the following regex:

$text_normal = qr{^(\/F\d+) FF (.*?) SCF SF (.*?) MV (\(.*?)SH$};

A sample of the string is like below:

my $text = '/F12345 FF FF this is SCF SF really MV (important stuff SH';

Can that be rewritten to speed up the matching?

569

asked Oct 20 '09 03:10

est

3 Answers

There's no single answer to optimizing a regex. You can watch what a particular regex is doing with the re pragma:

 use re 'debugcolor';

Once you see what it traverses the string, you see where it is having problems and adjust your regex from there. You'll learn a little about the regex engine as you do that.

You should also check out Mastering Regular Expressions, which tells you how regular expressions work and why some patterns are slower than others.

100

answered Sep 21 '22 16:09

brian d foy

Without seeing some sample data it is hard to say.

Generally, it is a good idea to avoid using .*. Look for any possible sources of unneeded backtracking, and eliminate them.

You might be able to get away with a split with a slice if your needs are simple.

 my @vals = (split / /, $string)[0,2,5,7];

answered Sep 22 '22 16:09

daotoad

This very much depends on the profile of the data you are scanning.

You identify the piece of your regular expression which filters out the most input and do a separate simpler regular expression for that expression.

For instance, if only 5% of your input date contained the 'MV' string, you could filter for this first and only apply the full more complex regular expression if the simpler one is true.

So you would have:

if ( $text_normal =~ / MV / ) {
  $text_normal = qr{^(\/F\d+) FF (.*?) SCF SF (.*?) MV (\(.*?)SH$};
  if .......
  }
}

answered Sep 18 '22 16:09

James Anderson

Related questions
                            
                                Regex for a valid numeric with optional commas & dot
                            
                                Android - Email validation [duplicate]
                            
                                A regular expression to remove a given (x)HTML tag from a string
                            
                                Can somebody explain a money regex that just checks if the value matches some pattern?
                            
                                Is it possible to convert an array of strings into one string?
                            
                                Python/Regex - How to extract date from filename using regular expression?
                            
                                How can I find the first occurrence of a pattern in a string from some starting position?
                            
                                remove unwanted commas in JavaScript
                            
                                Validate DNA in C/C++
                            
                                Escaping special characters in Perl regex
                            
                                Regex number and hyphen
                            
                                Python validation mobile number
                            
                                Replacing a specific part of a query string PHP
                            
                                String separation in required format, Pythonic way? (with or w/o Regex)
                            
                                GWT : how to get regex(Pattern and Matcher) working in client side
                            
                                Efficiently removing specific characters (some punctuation) from Strings in Java?
                            
                                Java Regular Expression to match dollar amounts
                            
                                unicode preg_replace problem in php
                            
                                Check non-numeric characters in string
                            
                                Regular expression that finds and replaces non-ascii characters with Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With