Given a list of strings like:
L = ['1759@1@83@0#[email protected]@[email protected]#1094@[email protected]@14.4',
'[email protected]@[email protected]',
'[email protected]@[email protected]#1101@2@40@0#1108@2@30@0',
'1430@[email protected]@2.15#1431@[email protected]@60.29#1074@[email protected]@58.8#1109',
'1809@[email protected]@292.66#1816@[email protected]@95.44#1076@[email protected]@1110.61']
I need to extract all integers with length 4 between separators #
or @
, and also extract the first and last integers. No floats.
My solution is a bit overcomplicated - replace with space and then applied this solution:
pat = r'(?<!\S)\d{4}(?!\S)'
out = [re.findall(pat, re.sub('[#@]', ' ', x)) for x in L]
print (out)
"""
[['1759', '1362', '1094'],
['1356'],
['1354', '1101', '1108'],
['1430', '1431', '1074', '1109'],
['1809', '1816', '1076']]
"""
Is it possible to change the regex for not using re.sub
necessarily for replace? Is there another solution with better performance?
To allow first and last occurrences that has no leading or trailing separator you could use negative lookarounds:
(?<![^#])\d{4}(?![^@])
(?<![^#])
is a near synonym for (?:^|#)
. The same applies for the negative lookahead.
See live demo here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With