Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RE split multiple arguments | (or) returns none python

Tags:

python

I'm using the RE expression in python and trying to split a chunk of text by period and by exclamation mark. However when I split it, I get a "None" in the result

a = "This is my text...I want it to split by periods. I also want it to split \
by exclamation marks! Is that so much to ask?"

This is my code:

re.split('((?<=\w)\.(?!\..))|(!)',a)

Note that I have this (?<=\w).(?!..) because I want it to avoid ellipses. Nevertheless, the above code spits out:

['This is my text...I want it to split by periods', '.', None, ' \
I also want it to split by exclamation marks', None, '!', \
' Is that so much to ask?']

As you can see, where a period or exclamation mark is, it has added a special "None" into my list. Why is this and how do I get rid of it?

like image 889
Terence Chow Avatar asked Jul 03 '12 22:07

Terence Chow


People also ask

How do you split multiple values in Python?

split() This is the most efficient and commonly used method to split multiple characters at once. It makes use of regex(regular expressions) in order to do this.

Can you split with multiple separators Python?

To split a string with multiple delimiters in Python, use the re. split() method. The re. split() function splits the string by each occurrence of the pattern.

What is re split () in Python?

The re. split() function splits the given string according to the occurrence of a particular character or pattern. Upon finding the pattern, this function returns the remaining characters from the string in a list.


1 Answers

Try the following:

re.split(r'((?<=\w)\.(?!\..)|!)', a)

You get the None because you have two capturing groups, and all groups are included as a part of the re.split() result.

So any time you match a . the second capture group is None, and any time you match a ! the first capture group is None.

Here is the result:

['This is my text...I want it to split by periods',
 '.',
 ' I also want it to split by exclamation marks',
 '!',
 ' Is that so much to ask?']

If you don't want to include '.' and '!' in your result, just remove the parentheses that surround the entire expression: r'(?<=\w)\.(?!\..)|!'

like image 137
Andrew Clark Avatar answered Sep 17 '22 05:09

Andrew Clark