I have a string. The end is different, such as <code>index.php?test=1&list=UL</code> or <code>index.php?list=UL&more=1</code>. The one thing I'm looking for is <code>&list=</code>. How can I match it, whether it's in the middle of the string or it's at the end? So far I've got <code>[&|\?]list=.*?([&|$])</code>, but the <code>([&|$])</code> part doesn't actually work; I'm trying to use that to match either <code>&</code> or the end of the string, but the end of the string part doesn't work, so this pattern matches the second example but not the first.

In short Any zero-width assertions inside <code>[...]</code> lose their meaning of a zero-width assertion. <code>[\b]</code> does not match a word boundary (it matches a backspace, or, in POSIX, <code>\</code> or <code>b</code>), <code>[$]</code> matches a literal <code>$</code> char, <code>[^]</code> is either an error or, as in ECMAScript regex flavor, any char. Same with <code>\z</code>, <code>\Z</code>, <code>\A</code> anchors. You may solve the problem using any of the below patterns: <pre class="prettyprint"><code>[&?]list=([^&]*) [&?]list=(.*?)(?=&|$) [&?]list=(.*?)(?![^&]) </code></pre> If you need to check for the "absolute", unambiguous string end anchor, you need to remember that is various regex flavors, it is expressed with different constructs: <pre class="prettyprint lang-cs prettyprint-override"><code>[&?]list=(.*?)(?=&|$) - OK for ECMA regex (JavaScript, default C++ `std::regex`) [&?]list=(.*?)(?=&|\z) - OK for .NET, Go, Onigmo (Ruby), Perl, PCRE (PHP, base R), Boost, ICU (R `stringr`), Java/Andorid [&?]list=(.*?)(?=&|\Z) - OK for Python </code></pre> Matching between a char sequence and a single char or end of string (current scenario) The <code>.*?([YOUR_SINGLE_CHAR_DELIMITER(S)]|$)</code> pattern (suggested by João Silva) is rather inefficient since the regex engine checks for the patterns that appear to the right of the lazy dot pattern first, and only if they do not match does it "expand" the lazy dot pattern. In these cases it is recommended to use negated character class (or bracket expression in the POSIX talk): <pre class="prettyprint"><code>[&?]list=([^&]*) </code></pre> See demo. Details <ul> <li> <code>[&?]</code> - a positive character class matching either <code>&</code> or <code>?</code> (note the relationships between chars/char ranges in a character class are OR relationships)</li> <li> <code>list=</code> - a substring, char sequence</li> <li> <code>([^&]*)</code> - Capturing group #1: zero or more (<code>*</code>) chars other than <code>&</code> (<code>[^&]</code>), as many as possible</li> </ul> Checking for the trailing single char delimiter presence without returning it or end of string Most regex flavors (including JavaScript beginning with ECMAScript 2018) support lookarounds, constructs that only return true or false if there patterns match or not. They are crucial in case consecutive matches that may start and end with the same char are expected (see the original pattern, it may match a string starting and ending with <code>&</code>). Although it is not expected in a query string, it is a common scenario. In that case, you can use two approaches: <ul> <li>A positive lookahead with an alternation containing positive character class: <code>(?=[SINGLE_CHAR_DELIMITER(S)]|$)</code> </li> <li>A negative lookahead with just a negative character class: <code>(?![^SINGLE_CHAR_DELIMITER(S)])</code> </li> </ul> The negative lookahead solution is a bit more efficient because it does not contain an alternation group that adds complexity to matching procedure. The OP solution would look like <pre class="prettyprint"><code>[&?]list=(.*?)(?=&|$) </code></pre> or <pre class="prettyprint"><code>[&?]list=(.*?)(?![^&]) </code></pre> See this regex demo and another one here. Certainly, in case the trailing delimiters are multichar sequences, only a positive lookahead solution will work since <code>[^yes]</code> does not negate a sequence of chars, but the chars inside the class (i.e. <code>[^yes]</code> matches any char but <code>y</code>, <code>e</code> and <code>s</code>).

In regex, match either the end of the string or a specific character

Tags:

regex

pattern-matching

I have a string. The end is different, such as index.php?test=1&list=UL or index.php?list=UL&more=1. The one thing I'm looking for is &list=.

How can I match it, whether it's in the middle of the string or it's at the end? So far I've got [&|\?]list=.*?([&|$]), but the ([&|$]) part doesn't actually work; I'm trying to use that to match either & or the end of the string, but the end of the string part doesn't work, so this pattern matches the second example but not the first.

563

asked Aug 23 '12 00:08

Gary

2 Answers

Use:

/(&|\?)list=.*?(&|$)/

Note that when you use a bracket expression, every character within it (with some exceptions) is going to be interpreted literally. In other words, [&|$] matches the characters &, |, and $.

answered Sep 28 '22 06:09

João Silva

In short

Any zero-width assertions inside [...] lose their meaning of a zero-width assertion. [\b] does not match a word boundary (it matches a backspace, or, in POSIX, \ or b), [$] matches a literal $ char, [^] is either an error or, as in ECMAScript regex flavor, any char. Same with \z, \Z, \A anchors.

You may solve the problem using any of the below patterns:

[&?]list=([^&]*) [&?]list=(.*?)(?=&|$) [&?]list=(.*?)(?![^&])

If you need to check for the "absolute", unambiguous string end anchor, you need to remember that is various regex flavors, it is expressed with different constructs:

[&?]list=(.*?)(?=&|$)  - OK for ECMA regex (JavaScript, default C++ `std::regex`) [&?]list=(.*?)(?=&|\z) - OK for .NET, Go, Onigmo (Ruby), Perl, PCRE (PHP, base R), Boost, ICU (R `stringr`), Java/Andorid [&?]list=(.*?)(?=&|\Z) - OK for Python

Matching between a char sequence and a single char or end of string (current scenario)

The .*?([YOUR_SINGLE_CHAR_DELIMITER(S)]|$) pattern (suggested by João Silva) is rather inefficient since the regex engine checks for the patterns that appear to the right of the lazy dot pattern first, and only if they do not match does it "expand" the lazy dot pattern.

In these cases it is recommended to use negated character class (or bracket expression in the POSIX talk):

[&?]list=([^&]*)

See demo. Details

[&?] - a positive character class matching either & or ? (note the relationships between chars/char ranges in a character class are OR relationships)
list= - a substring, char sequence
([^&]*) - Capturing group #1: zero or more (*) chars other than & ([^&]), as many as possible

Checking for the trailing single char delimiter presence without returning it or end of string

Most regex flavors (including JavaScript beginning with ECMAScript 2018) support lookarounds, constructs that only return true or false if there patterns match or not. They are crucial in case consecutive matches that may start and end with the same char are expected (see the original pattern, it may match a string starting and ending with &). Although it is not expected in a query string, it is a common scenario.

In that case, you can use two approaches:

A positive lookahead with an alternation containing positive character class: (?=[SINGLE_CHAR_DELIMITER(S)]|$)
A negative lookahead with just a negative character class: (?![^SINGLE_CHAR_DELIMITER(S)])

The negative lookahead solution is a bit more efficient because it does not contain an alternation group that adds complexity to matching procedure. The OP solution would look like

[&?]list=(.*?)(?=&|$)

[&?]list=(.*?)(?![^&])

See this regex demo and another one here.

Certainly, in case the trailing delimiters are multichar sequences, only a positive lookahead solution will work since [^yes] does not negate a sequence of chars, but the chars inside the class (i.e. [^yes] matches any char but y, e and s).

answered Sep 28 '22 05:09

Wiktor Stribiżew

Related questions
                            
                                How do I search for a pattern within a text file using Python combining regex & string/file operations and store instances of the pattern?
                            
                                What is the regex pattern for datetime (2008-09-01 12:35:45 )?
                            
                                Can I use a regular expression in querySelectorAll?
                            
                                What regex can match sequences of the same character?
                            
                                Is regular expression recognition of an email address hard?
                            
                                Remove trailing character(s) from string in Javascript
                            
                                RegEx that will match the last occurrence of dot in a string
                            
                                Need to perform Wildcard (*,?, etc) search on a string using Regex
                            
                                Easiest way to extract the urls from an html page using sed or awk only
                            
                                Number of occurrences of substring in string in Swift
                            
                                How to copy marked text in notepad++
                            
                                How to handle response encoding from urllib.request.urlopen() , to avoid TypeError: can't use a string pattern on a bytes-like object
                            
                                Python - Locating the position of a regex match in a string?
                            
                                How do I remove the non-numeric character from a string in java? [closed]
                            
                                Best way to extract MAC address from ifconfig's output?
                            
                                What's wrong with my lookahead regex in GNU sed?
                            
                                What is a regex "independent non-capturing group"?
                            
                                Recursive pattern in regex
                            
                                Lua pattern matching vs. regular expressions
                            
                                Regular Expression patterns for Tracking numbers

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With