Say I have a string s
containing letters and two delimiters 1
and 2
. I want to split the string in the following way:
t
falls between 1
and 2
, return t
So if s = 'ab1cd2efg1hij2k'
, the expected output is ['a', 'b', 'cd', 'e', 'f', 'g', 'hij', 'k']
.
I tried to use regular expressions:
import re
s = 'ab1cd2efg1hij2k'
re.findall( r'(1([a-z]+)2|[a-z])', s )
[('a', ''),
('b', ''),
('1cd2', 'cd'),
('e', ''),
('f', ''),
('g', ''),
('1hij2', 'hij'),
('k', '')]
From there i can do [ x[x[-1]!=''] for x in re.findall( r'(1([a-z]+)2|[a-z])', s ) ]
to get my answer, but I still don't understand the output. The documentation says that findall
returns a list of tuples if the pattern has more than one group. However, my pattern only contains one group. Any explanation is welcome.
If you want to have an 'or' match without having the split into match groups just add a '?:' to the beginning of the 'or' match.
Without '?:'
re.findall('(test (word1|word2))', 'test word1')
Output:
[('test word1', 'word1')]
With '?:'
re.findall('(test (?:word1|word2))', 'test word1')
Output:
['test word1']
Further explanation: https://www.ocpsoft.org/tutorials/regular-expressions/or-in-regex/
I am 5 years too late to the party, but I think I might have found an elegant solution to the re.findall() ugly tuple-ridden output with multiple capture groups.
In general, if you end up with an output which looks something like that:
[('pattern_1', '', ''), ('', 'pattern_2', ''), ('pattern_1', '', ''), ('', '', 'pattern_3')]
Then you can bring it into a flat list with this little trick:
["".join(x) for x in re.findall(all_patterns, iterable)]
The expected output will be like so:
['pattern_1', 'pattern_2', 'pattern_1', 'pattern_3']
It was tested on Python 3.7. Hope it helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With