Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression negative lookbehind of non-fixed length

As the document goes:

This is called a negative lookbehind assertion. Similar to positive lookbehind assertions, the contained pattern must only match strings of some fixed length.

So this will work, the intention is to match any , outside {}, but not inside {}:

In [188]:

re.compile("(?<!\{)\,.").findall('a1,a2,a3,a4,{,a6}')
Out[188]:
[',a', ',a', ',a', ',{']

this will work, on a slightly different query:

In [189]:

re.compile("(?<!\{a5)\,.").findall('a1,a2,a3,a4,{a5,a6}')
#or this: re.compile("(?<!\{..)\,.").findall('a1,a2,a3,a4,{a5,a6}')
Out[189]:
[',a', ',a', ',a', ',{']
In [190]:

But if the query is 'a1,a2,a3,a4,{_some_length_not_known_in_advance,a6}', according to the document the following won't work as intended:

In [190]:

re.compile("(?<![\{.*])\,.").findall('a1,a2,a3,a4,{a5,a6}')
Out[190]:
[',a', ',a', ',a', ',{', ',a']

Any alternative to achieve this? Is negative lookbehind the wrong approach?

Any reason this is how lookbehind was designed to do (only match strings of some fixed length) in the first place?

like image 247
CT Zhu Avatar asked Jun 07 '14 03:06

CT Zhu


People also ask

What is negative Lookbehind regex?

In negative lookbehind the regex engine first finds a match for an item after that it traces back and tries to match a given item which is just before the main match. In case of a successful traceback match the match is a failure, otherwise it is a success.

Can I use negative Lookbehind?

The positive lookbehind ( (? <= ) ) and negative lookbehind ( (? <! ) ) zero-width assertions in JavaScript regular expressions can be used to ensure a pattern is preceded by another pattern.

What is Lookbehind in regex?

Lookbehind, which is used to match a phrase that is preceded by a user specified text. Positive lookbehind is syntaxed like (? <=a)something which can be used along with any regex parameter. The above phrase matches any "something" word that is preceded by an "a" word.

Does JavaScript support negative Lookbehind?

Negative lookbehinds seem to be the only answer, but JavaScript doesn't has one. Consider posting the regex as it would look with a negative lookbehind; that may make it easier to respond. @WiktorStribiżew : Look-behinds were added in the 2018 spec. Chrome supports them, but Firefox still hasn't implemented the spec.


2 Answers

Any alternative to achieve this?

Yes. There is a a brilliantly simple technique, and this situation is very similar to "regex-match a pattern unless..."

Here's your simple regex:

{[^}]*}|(,)

The left side of the alternation | matches complete { brackets } tags. We will ignore these matches. The right side matches and captures commas to Group 1, and we know they are the right commas because they were not matched by the expression on the left.

Here is a demo that performs several tasks, so you can pick and choose (see the output at the bottom of the demo):

  1. Count the commas you want to match (not those between braces)
  2. Show the matches (commas... duh)
  3. Replace the right commas. Here we replace with SplitHere so we can perform task 4...
  4. Split on the commas, and display the split strings

Reference

How to match (or replace) a pattern except in situations s1, s2, s3...

like image 123
zx81 Avatar answered Sep 19 '22 01:09

zx81


Instead of using Negative Lookbehind, you can use Negative Lookahead with balanced braces.

,(?![^{]*\})

For example:

>>> re.findall(r',..(?![^{]*\})', 'a1,a2,a3,a4,{_some_unknown_length,a5,a6,a7}')
[',a2', ',a3', ',a4']
like image 35
hwnd Avatar answered Sep 19 '22 01:09

hwnd