Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split Strings into words with multiple word boundary delimiters

I think what I want to do is a fairly common task but I've found no reference on the web. I have text with punctuation, and I want a list of the words.

"Hey, you - what are you doing here!?" 

should be

['hey', 'you', 'what', 'are', 'you', 'doing', 'here'] 

But Python's str.split() only works with one argument, so I have all words with the punctuation after I split with whitespace. Any ideas?

like image 470
ooboo Avatar asked Jun 29 '09 17:06

ooboo


People also ask

How do you split by multiple delimiters?

To split a string with multiple delimiters in Python, use the re. split() method. The re. split() function splits the string by each occurrence of the pattern.

Can split () take multiple arguments?

split() only works with one argument, so I have all words with the punctuation after I split with whitespace.


2 Answers

re.split()

re.split(pattern, string[, maxsplit=0])

Split string by the occurrences of pattern. If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list. If maxsplit is nonzero, at most maxsplit splits occur, and the remainder of the string is returned as the final element of the list. (Incompatibility note: in the original Python 1.5 release, maxsplit was ignored. This has been fixed in later releases.)

>>> re.split('\W+', 'Words, words, words.') ['Words', 'words', 'words', ''] >>> re.split('(\W+)', 'Words, words, words.') ['Words', ', ', 'words', ', ', 'words', '.', ''] >>> re.split('\W+', 'Words, words, words.', 1) ['Words', 'words, words.'] 
like image 68
gimel Avatar answered Sep 22 '22 15:09

gimel


A case where regular expressions are justified:

import re DATA = "Hey, you - what are you doing here!?" print re.findall(r"[\w']+", DATA) # Prints ['Hey', 'you', 'what', 'are', 'you', 'doing', 'here'] 
like image 44
RichieHindle Avatar answered Sep 19 '22 15:09

RichieHindle