I'm trying to split these lines:
<label>Olympic Games</label>
<title>Next stop</title>
Into:
["<label>", "Olympic Games", "</label>"]
["<title>", "Next stop", "</title>"]
In Python I can use regular expressions but what I've made doesn't do anything:
line.split("<\*>")
Using lookarounds and a capture group to keep the text after splitting:
re.split(r'(?<=>)(.+?)(?=<)', '<label>Olympic Games</label>')
This regex works for me:
<(label|title)>([^<]*)</(label|title)>
or, as cwallenpoole suggested:
<(label|title)>([^<]*)</(\1)>
I've used http://www.regexpal.com/
I have used three capturing groups, if you don't need them, simply remove the ()
What is wrong about your regex <\*>
is that is matching only one thing: <*>
. You have scaped *
using \*
, so what you are saying is:
<
, then a *
and then a >
. If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With