Positive lookbehind vs match reset (\K) regex feature

Q: What does positive lookahead mean?

The positive lookahead construct is a pair of parentheses, with the opening parenthesis followed by a question mark and an equals sign. You can use any regular expression inside the lookahead (but not lookbehind, as explained below). Any valid regular expression can be used inside the lookahead.

Q: What is ?! In regex?

The ?! n quantifier matches any string that is not followed by a specific string n.

Q: What is a negative look ahead in regular expression?

In this type of lookahead the regex engine searches for a particular element which may be a character or characters or a group after the item matched. If that particular element is not present then the regex declares the match as a match otherwise it simply rejects that match.

Tags:

python

regex

php

ruby

perl

I just learned about the apparently undocumented \K behavior in Ruby regex (thanks to this answer by anubhava). This feature (possibly named Keep?) also exists in PHP, Perl, and Python regex flavors. It is described elsewhere as "drops what was matched so far from the match to be returned."

"abc".match(/ab\Kc/)     # matches "c"

Is this behavior identical to the positive lookbehind marker as used below?

"abc".match(/(?<=ab)c/)  # matches "c"

If not, what differences do the two exhibit?

507

asked Jan 29 '16 19:01

user513951

1 Answers

It's easier to see the difference between \K and (?<=...) with the String#scan method.

A lookbehind is a zero-width assertion that doesn't consume characters and that is tested (backwards) from the current position:

> "abcdefg".scan(/(?<=.)./)
=> ["b", "c", "d", "e", "f", "g"]

The "keep" feature \K (that isn't an anchor) defines a position in the pattern where all that was matched so far by the pattern on the left is removed from the match result. But all characters matched before the \K are consumed, they just don't appear in the result:

> "abcdefg".scan(/.\K./)
=> ["b", "d", "f"]

The behaviour is the same as without \K:

> "abcdefg".scan(/../)
=> ["ab", "cd", "ef"]

except that the characters before the \K are removed from the result.

One interesting use of \K is to emulate a variable-length lookbehind, which is not allowed in Ruby (the same for PHP and Perl), or to avoid the creation of a unique capture group. For example (?<=a.*)f. can be implemented using \K:

> "abcdefg".match(/a.*\Kf./)
=> #<MatchData "fg">

An alternative way would be to write /a.*(f.)/, but the \K avoids the need to create a capture group.

Note that the \K feature also exists in the python regex module, even this one allows variable-length lookbehinds.

answered Sep 27 '22 20:09

Casimir et Hippolyte

Related questions
                            
                                How to insert record with many belongsTo relations in Laravel Eloquent
                            
                                Difference between DB::Table and DB::Select
                            
                                Indentation size with JSON_PRETTY_PRINT
                            
                                Command line Doctrine ORM with Silex: You are missing a "cli-config.php" or "config/cli-config.php" file in your project
                            
                                TokenMismatchException in VerifyCsrfToken.php line 53 in Laravel 5.1
                            
                                Authentication no longer works after GET to POST change
                            
                                How can I convert Hijri date to gregorian in PHP?
                            
                                Laravel Add new Table with migrate
                            
                                Auto secure form on Laravel 4.2
                            
                                json_decode is rounding floats, how can I prevent it?
                            
                                Apache error 500 on large file uploads (mod_security)
                            
                                How to authenticate non-wsdl soap in PHP
                            
                                Query for post and tags in same query
                            
                                How to clear or delete a browser session with firefox developer tools
                            
                                Running Apache mod_php and mod_fastcgi in seperate vhosts on one Apache server
                            
                                How to share Laravel session with node?
                            
                                how to send smtp mail from localhost
                            
                                php regex to detect text inside brackets ignoring nested brackets
                            
                                Assert PHPUnit that an object has an integer attribute
                            
                                Using JOIN query in AWS DynamoDB using PHP

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With