Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing data between double squiggly brackets with nested sub brackets in python

I'm having some difficulty with this problem. I need to remove all data that's contained in squiggly brackets.

Like such:

Hello {{world of the {{ crazy}} {{need {{ be}}}} sea }} there.

Becomes:

Hello there.

Here's my first try (I know it's terrible):

while 1:
    firstStartBracket = text.find('{{')
    if (firstStartBracket == -1):
        break;
    firstEndBracket = text.find('}}')
    if (firstEndBracket == -1):
        break;
    secondStartBracket = text.find('{{',firstStartBracket+2);
    lastEndBracket = firstEndBracket;
    if (secondStartBracket == -1 or secondStartBracket > firstEndBracket):
        text = text[:firstStartBracket] + text[lastEndBracket+2:];
        continue;
    innerBrackets = 2;
    position = secondStartBracket;
    while innerBrackets:
        print innerBrackets;
        #everytime we find a next start bracket before the ending add 1 to inner brackets else remove 1
        nextEndBracket = text.find('}}',position+2);
        nextStartBracket = text.find('{{',position+2);
        if (nextStartBracket != -1 and nextStartBracket < nextEndBracket):
            innerBrackets += 1;
            position = nextStartBracket;
            # print text[position-2:position+4];
        else:
            innerBrackets -= 1;
            position = nextEndBracket;
            # print text[position-2:position+4];
            # print nextStartBracket
            # print lastEndBracket
            lastEndBracket = nextEndBracket;
        print 'pos',position;
    text = text[:firstStartBracket] + text[lastEndBracket+2:];

It seems to work but runs out of memory quite fast. Is there any better way to do this (hopefully with regex)?

EDIT: I was not clear so I'll give another example. I need to allow for multiple top level brackets.

Like such:

Hello {{world of the {{ crazy}} {{need {{ be}}}} sea }} there {{my }} friend.

Becomes:

Hello there friend.
like image 607
thewormsterror Avatar asked Feb 25 '16 00:02

thewormsterror


People also ask

How do you remove content inside brackets without removing brackets in Python?

Method 1: We will use sub() method of re library (regular expressions). sub(): The functionality of sub() method is that it will find the specific pattern and replace it with some string. This method will find the substring which is present in the brackets or parenthesis and replace it with empty brackets.

How do I get rid of curly brackets in Python?

In python, we can remove brackets with the help of regular expressions. # pattern is the special RE expression for finding the brackets.

How do you remove parentheses in Python?

Using the replace() Function to Remove Parentheses from String in Python. In Python, we use the replace() function to replace some portion of a string with another string. We can use this function to remove parentheses from string in Python by replacing their occurrences with an empty character.


2 Answers

This is a regex/generator based solution that works with any number of braces. This problem does not need an actual stack because there is only 1 type (well, pair) of token involved. The level fills the role that a stack would fill in a more complex parser.

import re

def _parts_outside_braces(text):
    level = 0
    for part in re.split(r'(\{\{|\}\})', text):
        if part == '{{':
            level += 1
        elif part == '}}':
            level = level - 1 if level else 0
        elif level == 0:
            yield part

x = 'Hello {{world of the {{ crazy}} {{need {{ be}}}} sea }} there.  {{ second set {{ of }} braces }}'
print(''.join(_parts_outside_braces(x)))

More general points... the capture group in the regex is what makes the braces show up in the output of re.split, otherwise you only get the stuff in between. There's also some support for mismatched braces. For a strict parser, that should raise an exception, as should running off the end of the string with level > 0. For a loose, web-browser style parser, maybe you would want to display those }} as output...

like image 148
Jason S Avatar answered Sep 24 '22 06:09

Jason S


You can use pyparsing module here. Solution based on this answer:

from pyparsing import nestedExpr


s = "Hello {{world of the {{ crazy}} {{need {{ be}}}} sea }} there {{my }} friend."

expr = nestedExpr('{{', '}}')
result = expr.parseString("{{" + s + "}}").asList()[0]
print(" ".join(item for item in result if not isinstance(item, list)))

Prints:

Hello there friend.

The following would only work if there is only one top-level pair of braces.

If you want to remove everything inside the double curly braces with the braces themselves:

>>> import re
>>> 
>>> s = "Hello {{world of the {{ crazy}} {{need {{ be}}}} sea }} there."
>>> re.sub(r"\{\{.*\}\} ", "", s)
'Hello there.'

\{\{.*\}\} would match double curly braces followed by any characters any number of times (intentionally left it "greedy") followed by double curly braces and a space.

like image 44
alecxe Avatar answered Sep 26 '22 06:09

alecxe