Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract integers with specific length between separators

Given a list of strings like:

L = ['1759@1@83@0#[email protected]@[email protected]#1094@[email protected]@14.4', 
     '[email protected]@[email protected]', 
     '[email protected]@[email protected]#1101@2@40@0#1108@2@30@0',
     '1430@[email protected]@2.15#1431@[email protected]@60.29#1074@[email protected]@58.8#1109',
     '1809@[email protected]@292.66#1816@[email protected]@95.44#1076@[email protected]@1110.61']

I need to extract all integers with length 4 between separators # or @, and also extract the first and last integers. No floats.

My solution is a bit overcomplicated - replace with space and then applied this solution:

pat = r'(?<!\S)\d{4}(?!\S)'
out = [re.findall(pat, re.sub('[#@]', ' ', x)) for x in L]
print (out)
"""
[['1759', '1362', '1094'], 
 ['1356'], 
 ['1354', '1101', '1108'], 
 ['1430', '1431', '1074', '1109'], 
 ['1809', '1816', '1076']]
"""

Is it possible to change the regex for not using re.sub necessarily for replace? Is there another solution with better performance?

like image 612
jezrael Avatar asked Dec 18 '22 18:12

jezrael


1 Answers

To allow first and last occurrences that has no leading or trailing separator you could use negative lookarounds:

(?<![^#])\d{4}(?![^@])

(?<![^#]) is a near synonym for (?:^|#). The same applies for the negative lookahead.

See live demo here

like image 147
revo Avatar answered Jan 28 '23 20:01

revo