I'm using the RE expression in python and trying to split a chunk of text by period and by exclamation mark. However when I split it, I get a "None" in the result
a = "This is my text...I want it to split by periods. I also want it to split \
by exclamation marks! Is that so much to ask?"
This is my code:
re.split('((?<=\w)\.(?!\..))|(!)',a)
Note that I have this (?<=\w).(?!..) because I want it to avoid ellipses. Nevertheless, the above code spits out:
['This is my text...I want it to split by periods', '.', None, ' \
I also want it to split by exclamation marks', None, '!', \
' Is that so much to ask?']
As you can see, where a period or exclamation mark is, it has added a special "None" into my list. Why is this and how do I get rid of it?
split() This is the most efficient and commonly used method to split multiple characters at once. It makes use of regex(regular expressions) in order to do this.
To split a string with multiple delimiters in Python, use the re. split() method. The re. split() function splits the string by each occurrence of the pattern.
The re. split() function splits the given string according to the occurrence of a particular character or pattern. Upon finding the pattern, this function returns the remaining characters from the string in a list.
Try the following:
re.split(r'((?<=\w)\.(?!\..)|!)', a)
You get the None
because you have two capturing groups, and all groups are included as a part of the re.split()
result.
So any time you match a .
the second capture group is None
, and any time you match a !
the first capture group is None
.
Here is the result:
['This is my text...I want it to split by periods',
'.',
' I also want it to split by exclamation marks',
'!',
' Is that so much to ask?']
If you don't want to include '.'
and '!'
in your result, just remove the parentheses that surround the entire expression: r'(?<=\w)\.(?!\..)|!'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With