As the document goes: <blockquote> This is called a negative lookbehind assertion. Similar to positive lookbehind assertions, the contained pattern must only match strings of some fixed length. </blockquote> So this will work, the intention is to match any <code>,</code> outside <code>{}</code>, but not inside <code>{}</code>: <pre class="prettyprint"><code>In [188]: re.compile("(?<!\{)\,.").findall('a1,a2,a3,a4,{,a6}') Out[188]: [',a', ',a', ',a', ',{'] </code></pre> this will work, on a slightly different query: <pre class="prettyprint"><code>In [189]: re.compile("(?<!\{a5)\,.").findall('a1,a2,a3,a4,{a5,a6}') #or this: re.compile("(?<!\{..)\,.").findall('a1,a2,a3,a4,{a5,a6}') Out[189]: [',a', ',a', ',a', ',{'] In [190]: </code></pre> But if the query is <code>'a1,a2,a3,a4,{_some_length_not_known_in_advance,a6}'</code>, according to the document the following won't work as intended: <pre class="prettyprint"><code>In [190]: re.compile("(?<![\{.*])\,.").findall('a1,a2,a3,a4,{a5,a6}') Out[190]: [',a', ',a', ',a', ',{', ',a'] </code></pre> Any alternative to achieve this? Is negative lookbehind the wrong approach? Any reason this is how lookbehind was designed to do (only match strings of some fixed length) in the first place?

Instead of using Negative Lookbehind, you can use Negative Lookahead with balanced braces. <pre class="prettyprint"><code>,(?![^{]*\}) </code></pre> For example: <pre class="prettyprint"><code>>>> re.findall(r',..(?![^{]*\})', 'a1,a2,a3,a4,{_some_unknown_length,a5,a6,a7}') [',a2', ',a3', ',a4'] </code></pre>

Regular expression negative lookbehind of non-fixed length

Q: What is Lookbehind in regex?

Lookbehind, which is used to match a phrase that is preceded by a user specified text. Positive lookbehind is syntaxed like (? <=a)something which can be used along with any regex parameter. The above phrase matches any "something" word that is preceded by an "a" word.

Q: Does JavaScript support negative Lookbehind?

Negative lookbehinds seem to be the only answer, but JavaScript doesn't has one. Consider posting the regex as it would look with a negative lookbehind; that may make it easier to respond. @WiktorStribiżew : Look-behinds were added in the 2018 spec. Chrome supports them, but Firefox still hasn't implemented the spec.

Tags:

python

regex

python-2.7

As the document goes:

This is called a negative lookbehind assertion. Similar to positive lookbehind assertions, the contained pattern must only match strings of some fixed length.

So this will work, the intention is to match any , outside {}, but not inside {}:

In [188]:

re.compile("(?<!\{)\,.").findall('a1,a2,a3,a4,{,a6}')
Out[188]:
[',a', ',a', ',a', ',{']

this will work, on a slightly different query:

In [189]:

re.compile("(?<!\{a5)\,.").findall('a1,a2,a3,a4,{a5,a6}')
#or this: re.compile("(?<!\{..)\,.").findall('a1,a2,a3,a4,{a5,a6}')
Out[189]:
[',a', ',a', ',a', ',{']
In [190]:

But if the query is 'a1,a2,a3,a4,{_some_length_not_known_in_advance,a6}', according to the document the following won't work as intended:

In [190]:

re.compile("(?<![\{.*])\,.").findall('a1,a2,a3,a4,{a5,a6}')
Out[190]:
[',a', ',a', ',a', ',{', ',a']

Any alternative to achieve this? Is negative lookbehind the wrong approach?

Any reason this is how lookbehind was designed to do (only match strings of some fixed length) in the first place?

247

asked Jun 07 '14 03:06

CT Zhu

2 Answers

Any alternative to achieve this?

Yes. There is a a brilliantly simple technique, and this situation is very similar to "regex-match a pattern unless..."

Here's your simple regex:

{[^}]*}|(,)

The left side of the alternation | matches complete { brackets } tags. We will ignore these matches. The right side matches and captures commas to Group 1, and we know they are the right commas because they were not matched by the expression on the left.

Here is a demo that performs several tasks, so you can pick and choose (see the output at the bottom of the demo):

Count the commas you want to match (not those between braces)
Show the matches (commas... duh)
Replace the right commas. Here we replace with SplitHere so we can perform task 4...
Split on the commas, and display the split strings

Reference

How to match (or replace) a pattern except in situations s1, s2, s3...

123

answered Sep 19 '22 01:09

zx81

Instead of using Negative Lookbehind, you can use Negative Lookahead with balanced braces.

,(?![^{]*\})

For example:

>>> re.findall(r',..(?![^{]*\})', 'a1,a2,a3,a4,{_some_unknown_length,a5,a6,a7}')
[',a2', ',a3', ',a4']

answered Sep 19 '22 01:09

hwnd

Related questions
                            
                                Plotting a 3d surface from a list of tuples in matplotlib
                            
                                How can I use protocol buffers for Python on windows?
                            
                                Comsuming MassTransit from Python or other languages
                            
                                Rotated picture looks like it's missing pixels
                            
                                How to download and use python on ubuntu? [closed]
                            
                                django uploading files without model
                            
                                Inverse of numpy.dot
                            
                                Open file for read/write, create if needed
                            
                                Making Probability Distribution Functions (PDFs) from histograms
                            
                                Converting a very small python Decimal into a non-scientific notation string
                            
                                How can I download a PyPI package for pip installation at a later date?
                            
                                How does HAProxy achieves its speed?
                            
                                Functions access to global variables
                            
                                cv2.createTrackbar using python
                            
                                Make a Custom Class JSON serializable
                            
                                memcache.get returns wrong object (Celery, Django)
                            
                                Adding an additional index to an existing multi-index dataframe
                            
                                add a new column to an existing csv file
                            
                                error_perm: 550 Permission denied
                            
                                Pandas: decompress date range to individual dates

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With