I have a message which I am trying to split.
import re
message = "Aug 10, 17:04 UTCThis is update 1.Aug 10, 15:56 UTCThis is update 2.Aug 10, 15:55 UTCThis is update 3."
split_message = re.split(r'[a-zA-Z]{3} (0[1-9]|[1-2][0-9]|3[0-1]), ([0-1]?[0-9]|2[0-3]):[0-5][0-9] UTC', message)
print(split_message)
Expected Output:
["This is update 1", "This is update 2", "This is update 3"]
Actual Output:
['', '10', '17', "This is update 1", '10', '15', "This is update 2", '10', '15', "This is update 3"]
Not sure what I am missing.
You are using "capturing groups", this is why their content is also part of the result array. You'll want to use non capturing groups (beginning with ?:):
import re
message = "Aug 10, 17:04 UTCThis is update 1.Aug 10, 15:56 UTCThis is update 2.Aug 10, 15:55 UTCThis is update 3."
split_message = re.split(r"[a-zA-Z]{3} (?:0[1-9]|[1-2][0-9]|3[0-1]), (?:[0-1]?[0-9]|2[0-3]):[0-5][0-9] UTC", message)
print(split_message)
You will however always get an empty entry first, because an empty string is in front of your first split pattern:
['', 'This is update 1.', 'This is update 2.', 'This is update 3.']
As statet in the docs:
If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With