Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is lookahead (sometimes) faster than capturing?

This question is inspired by this other one.

Comparing s/,(\d)/$1/ to s/,(?=\d)//: the former uses a capture group to replace only the digit but not the comma, the latter uses a lookahead to determine whether the comma is succeeded by a digit. Why is the latter sometimes faster, as discussed in this answer?

like image 782
mpe Avatar asked Dec 03 '12 11:12

mpe


1 Answers

The two approaches do different things and have different kinds of overhead costs. When you capture, perl has to make a copy of the captured text. Look-ahead matches without consuming; it has to mark the location where it starts. You can see what's happening by using the re 'debug' pragma:

use re 'debug';
my $capture = qr/,(\d)/;
Compiling REx ",(\d)"
Final program:
   1: EXACT  (3)
   3: OPEN1 (5)
   5:   DIGIT (6)
   6: CLOSE1 (8)
   8: END (0)
anchored "," at 0 (checking anchored) minlen 2 
Freeing REx: ",(\d)"
use re 'debug';
my $lookahead = qr/,(?=\d)/;
Compiling REx ",(?=\d)"
Final program:
   1: EXACT  (3)
   3: IFMATCH[0] (8)
   5:   DIGIT (6)
   6:   SUCCEED (0)
   7: TAIL (8)
   8: END (0)
anchored "," at 0 (checking anchored) minlen 1 
Freeing REx: ",(?=\d)"

I'd expect look-ahead to be faster than capturing in most cases, but as noted in the other thread regex performance can be data dependent.

like image 178
Michael Carman Avatar answered Sep 30 '22 13:09

Michael Carman