I have the following string.
words = "this is a book and i like it"
What i want is that when i split it by one space i get the following.
wordList = words.split(" ")
print wordList
<< ['this','is','a',' book','and','i',' like','it']
Simple words.split(" ")
function splits the string but incase of double space it remove both spaces which gives 'book'
and 'like'
. and what i need is ' book'
and ' like'
keeping extra spaces intact in the split output in case of double, triple... n spaces
Python String split() MethodThe split() method splits a string into a list. You can specify the separator, default separator is any whitespace. Note: When maxsplit is specified, the list will contain the specified number of elements plus one.
Use the re. sub() method to replace multiple spaces with a single space, e.g. result = re. sub(' +', ' ', my_str) .
You can split on whitespace that is not preceded by white space using look behind (?<=) syntax:
import re
re.split("(?<=\\S) ", words)
# ['this', 'is', 'a', ' book', 'and', 'i', ' like', 'it']
Or similarly, use negative look behind:
re.split("(?<!\\s) ", words)
# ['this', 'is', 'a', ' book', 'and', 'i', ' like', 'it']
Just another regex solution: if you need to split with a single left-most whitespace char, use \s?
to match one or zero whitespaces, and then capture 0+ remaining whitespaces and the subsequent non-whitespace chars.
One very important step: run rstrip
on the input string before running the regex to remove all the trailing whitespace, since otherwise, its performance will decrease greatly.
import re
words = "this is a book and i like it"
print(re.findall(r'\s?(\s*\S+)', words.rstrip()))
# => ['this', 'is', 'a', ' book', 'and', 'i', ' like', 'it']
See a Python demo. The re.findall
returns just the captured substrings and since we only have one capturing group, the result is a list of those captures.
Also, here is a regex demo. Details:
\s?
- 1 or 0 (due to ?
quantifier) whitespaces(\s*\S+)
- Capturing group #1 matching
\s*
- zero or more (due to the *
quantifier) whitespace\S+
- 1 or more (due to +
quantifier) non-whitespace symbols.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With