Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split string into list contains alphabetical bullet list

Tags:

My string contains text = "a) Baghdad, Iraq b) United Arab Emirates (possibly)"

I want to split this in list like ["Baghdad, Iraq","United Arab Emirates (possibly)"]

The code which i have used is not providing me the desired result

re.split('\\s*([a-zA-Z\\d][).]|•)\\s*(?=[A-Z])', text)

Please help me regarding this

like image 975
Sharjeel Ali Shaukat Avatar asked Nov 22 '18 12:11

Sharjeel Ali Shaukat


2 Answers

You could create the wanted data for your example using a list comp and a second regex:

import re

text = "a) Baghdad, Iraq b) United Arab Emirates (possibly)"

# different 1.regex pattern, same result - refining with 2nd pattern
data = [x for x in re.split(r'((?:^\s*[a-zA-Z0-9]\))|(?:\s+[a-zA-Z0-9]\)))\s*', 
                            text) if x and not re.match(r"\s*[a-zA-Z]\)",x)]
print(data)

Output:

['Baghdad, Iraq', 'United Arab Emirates (possibly)']

See https://regex101.com/r/wxEEQW/1

like image 93
Patrick Artner Avatar answered Oct 06 '22 00:10

Patrick Artner


Instead of re.findall, you can simply use re.split:

import re
text = "a) Baghdad, Iraq b) United Arab Emirates (possibly)"
countries = list(filter(None, map(str.rstrip, re.split('\w\)\s', text))))

Output:

['Baghdad, Iraq', 'United Arab Emirates (possibly)']
like image 22
Ajax1234 Avatar answered Oct 05 '22 23:10

Ajax1234