Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract string inside nested brackets

I need to extract strings from nested brackets like so:

[ this is [ hello [ who ] [what ] from the other side ] slim shady ]

Result (Order doesn't matter):

This is slim shady
Hello from the other side
Who 
What

Note, the string could have N brackets, and they will always be valid, but may or may not be nested. Also, the string doesn't have to start with a bracket.

The solutions I have found online to a similar problem suggest a regex, but I'm not sure it will work in this case.

I was thinking of implementing this similar to how we check if a string has all valid parentheses:

Walk through the string. If we see a [ we push its index on the stack, if we see a ], we substring from there to the current spot.

However, we'd need to erase that substring from the original string so we don't get it as part of any of the outputs. So, instead of pushing just pushing the index into the stack, I was thinking of creating a LinkedList as we go along, and when we find a [ we insert that Node on the LinkedList. This will allow us to easily delete the substring from the LinkedList.

Would this be a good approach or is there a cleaner, known solution?

EDIT:

'[ this is [ hello [ who ] [what ] from the other [side] ] slim shady ][oh my [g[a[w[d]]]]]'

Should return (Order doesn't matter):

this is slim shady
hello from the other
who 
what 
side
oh my
g
a
w
d

White spaces don't matter, that's trivial to remove afterwards. What matters is being able to distinguish the different contents within the brackets. Either by separating them in new lines, or having a list of strings.

like image 871
lorenzocastillo Avatar asked Jul 19 '16 10:07

lorenzocastillo


4 Answers

This code scans the text by character and pushes an empty list on to the stack for every opening [ and pops the last pushed list off the stack for every closing ].

text = '[ this is [ hello [ who ] [what ] from the other side ] slim shady ]'

def parse(text):
    stack = []
    for char in text:
        if char == '[':
            #stack push
            stack.append([])
        elif char == ']':
            yield ''.join(stack.pop())
        else:
            #stack peek
            stack[-1].append(char)

print(tuple(parse(text)))

Output;

(' who ', 'what ', ' hello   from the other side ', ' this is  slim shady ')
(' who ', 'what ', 'side', ' hello   from the other  ', ' this is  slim shady ', 'd', 'w', 'a', 'g', 'oh my ')
like image 52
Nizam Mohamed Avatar answered Nov 15 '22 21:11

Nizam Mohamed


This can quite comfortably be solved using regex:

import re

s= '[ this is [ hello [ who ] [what ] from the other [side] ] slim shady ][oh my [g[a[w[d]]]]]'

result= []
pattern= r'\[([^[\]]*)\]' #regex pattern to find non-nested square brackets
while '[' in s: #while brackets remain
    result.extend(re.findall(pattern, s)) #find them all and add them to the list
    s= re.sub(pattern, '', s) #then remove them
result= filter(None, (t.strip() for t in result)) #strip whitespace and drop empty strings

#result: ['who', 'what', 'side', 'd', 'hello   from the other', 'w', 'this is  slim shady', 'a', 'g', 'oh my']
like image 36
Aran-Fey Avatar answered Nov 15 '22 19:11

Aran-Fey


a = '[ this is [ hello [ who ] [what ] from the other side ] slim shady ]'
lvl = -1
words = []
for i in a:
    if i == '[' :
        lvl += 1
        words.append('')
    elif i == ']' :
        lvl -= 1
    else:
        words[lvl] += i

for word in words:
    print ' '.join(word.split())

This gives o/p -

this is slim shady

hello from the other side

who what

like image 27
Sahil Agarwal Avatar answered Nov 15 '22 20:11

Sahil Agarwal


You can represent your matches using a tree-like structure.

class BracketMatch:
    def __init__(self, refstr, parent=None, start=-1, end=-1):
        self.parent = parent
        self.start = start
        self.end = end
        self.refstr = refstr
        self.nested_matches = []
    def __str__(self):
        cur_index = self.start+1
        result = ""
        if self.start == -1 or self.end == -1:
            return ""
        for child_match in self.nested_matches:
            if child_match.start != -1 and child_match.end != -1:
                result += self.refstr[cur_index:child_match.start]
                cur_index = child_match.end + 1
            else:
                continue
        result += self.refstr[cur_index:self.end]
        return result

# Main script
haystack = '''[ this is [ hello [ who ] [what ] from the other side ] slim shady ]'''
root = BracketMatch(haystack)
cur_match = root
for i in range(len(haystack)):
    if '[' == haystack[i]:
        new_match = BracketMatch(haystack, cur_match, i)
        cur_match.nested_matches.append(new_match)
        cur_match = new_match
    elif ']' == haystack[i]:
        cur_match.end = i
        cur_match = cur_match.parent
    else:
        continue
# Here we built the set of matches, now we must print them
nodes_list = root.nested_matches
# So we conduct a BFS to visit and print each match...
while nodes_list != []:
    node = nodes_list.pop(0)
    nodes_list.extend(node.nested_matches)
    print("Match: " + str(node).strip())

The output of that program will be:

Match: this is slim shady
Match: hello from the other side
Match: who
Match: what

like image 35
Rerito Avatar answered Nov 15 '22 19:11

Rerito