I'm trying to figure out how to extract some data from a string according to this list:
check_list = ['E1', 'E2', 'E7', 'E3', 'E9', 'E10', 'E12', 'IN1', 'IN2', 'IN4', 'IN10']
For example for this list:
s1 = "apto E1-E10 tower 1-2 sanit"
I would get ['E1', 'E10']
s2 = "apto IN2-IN1-IN4-E12-IN10 mamp"
For this I would get: ['IN2', 'IN1', 'IN4', 'E12', 'IN10']
And then this gets tricky:
s3 = "E-2-7-3-9-12; IN1-4-10 T 1-2 inst. hidr."
I would get: ['E2', 'E7', 'E3', 'E9', 'E12', 'IN1', 'IN4', 'IN10']
Can you please give some advice to solve this?
The following should work:
def extract_data(s):
check_set = set(['E1', 'E2', 'E7', 'E3', 'E9', 'E10', 'E12',
'IN1', 'IN2', 'IN4', 'IN10'])
result = []
for match in re.finditer(r'\b(E|IN)[-\d]+', s):
for digits in re.findall(r'\d+', match.group(0)):
item = match.group(1) + digits
if item in check_set:
result.append(item)
return result
Examples:
>>> extract_data("apto E1-E10 tower 1-2 sanit")
['E1', 'E10']
>>> extract_data("apto IN2-IN1-IN4-E12-IN10 mamp")
['IN2', 'IN1', 'IN4', 'E12', 'IN10']
>>> extract_data("E-2-7-3-9-12; IN1-4-10 T 1-2 inst. hidr.")
['E2', 'E7', 'E3', 'E9', 'E12', 'IN1', 'IN4', 'IN10']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With