I need to extract strings from nested brackets like so:
[ this is [ hello [ who ] [what ] from the other side ] slim shady ]
Result (Order doesn't matter):
This is slim shady
Hello from the other side
Who
What
Note, the string could have N brackets, and they will always be valid, but may or may not be nested. Also, the string doesn't have to start with a bracket.
The solutions I have found online to a similar problem suggest a regex, but I'm not sure it will work in this case.
I was thinking of implementing this similar to how we check if a string has all valid parentheses:
Walk through the string. If we see a [ we push its index on the stack, if we see a ], we substring from there to the current spot.
However, we'd need to erase that substring from the original string so we don't get it as part of any of the outputs. So, instead of pushing just pushing the index into the stack, I was thinking of creating a LinkedList as we go along, and when we find a [ we insert that Node on the LinkedList. This will allow us to easily delete the substring from the LinkedList.
Would this be a good approach or is there a cleaner, known solution?
EDIT:
'[ this is [ hello [ who ] [what ] from the other [side] ] slim shady ][oh my [g[a[w[d]]]]]'
Should return (Order doesn't matter):
this is slim shady
hello from the other
who
what
side
oh my
g
a
w
d
White spaces don't matter, that's trivial to remove afterwards. What matters is being able to distinguish the different contents within the brackets. Either by separating them in new lines, or having a list of strings.
This code scans the text by character and pushes an empty list
on to the stack for every opening [
and pops the last pushed list
off the stack for every closing ]
.
text = '[ this is [ hello [ who ] [what ] from the other side ] slim shady ]'
def parse(text):
stack = []
for char in text:
if char == '[':
#stack push
stack.append([])
elif char == ']':
yield ''.join(stack.pop())
else:
#stack peek
stack[-1].append(char)
print(tuple(parse(text)))
Output;
(' who ', 'what ', ' hello from the other side ', ' this is slim shady ')
(' who ', 'what ', 'side', ' hello from the other ', ' this is slim shady ', 'd', 'w', 'a', 'g', 'oh my ')
This can quite comfortably be solved using regex:
import re
s= '[ this is [ hello [ who ] [what ] from the other [side] ] slim shady ][oh my [g[a[w[d]]]]]'
result= []
pattern= r'\[([^[\]]*)\]' #regex pattern to find non-nested square brackets
while '[' in s: #while brackets remain
result.extend(re.findall(pattern, s)) #find them all and add them to the list
s= re.sub(pattern, '', s) #then remove them
result= filter(None, (t.strip() for t in result)) #strip whitespace and drop empty strings
#result: ['who', 'what', 'side', 'd', 'hello from the other', 'w', 'this is slim shady', 'a', 'g', 'oh my']
a = '[ this is [ hello [ who ] [what ] from the other side ] slim shady ]'
lvl = -1
words = []
for i in a:
if i == '[' :
lvl += 1
words.append('')
elif i == ']' :
lvl -= 1
else:
words[lvl] += i
for word in words:
print ' '.join(word.split())
This gives o/p -
this is slim shady
hello from the other side
who what
You can represent your matches using a tree-like structure.
class BracketMatch:
def __init__(self, refstr, parent=None, start=-1, end=-1):
self.parent = parent
self.start = start
self.end = end
self.refstr = refstr
self.nested_matches = []
def __str__(self):
cur_index = self.start+1
result = ""
if self.start == -1 or self.end == -1:
return ""
for child_match in self.nested_matches:
if child_match.start != -1 and child_match.end != -1:
result += self.refstr[cur_index:child_match.start]
cur_index = child_match.end + 1
else:
continue
result += self.refstr[cur_index:self.end]
return result
# Main script
haystack = '''[ this is [ hello [ who ] [what ] from the other side ] slim shady ]'''
root = BracketMatch(haystack)
cur_match = root
for i in range(len(haystack)):
if '[' == haystack[i]:
new_match = BracketMatch(haystack, cur_match, i)
cur_match.nested_matches.append(new_match)
cur_match = new_match
elif ']' == haystack[i]:
cur_match.end = i
cur_match = cur_match.parent
else:
continue
# Here we built the set of matches, now we must print them
nodes_list = root.nested_matches
# So we conduct a BFS to visit and print each match...
while nodes_list != []:
node = nodes_list.pop(0)
nodes_list.extend(node.nested_matches)
print("Match: " + str(node).strip())
The output of that program will be:
Match: this is slim shady
Match: hello from the other side
Match: who
Match: what
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With