I have the following string. <code>words = "this is a book and i like it"</code> What i want is that when i split it by one space i get the following. <code> wordList = words.split(" ") print wordList << ['this','is','a',' book','and','i',' like','it'] </code> Simple <code>words.split(" ")</code> function splits the string but incase of double space it remove both spaces which gives <code>'book'</code> and <code>'like'</code>. and what i need is <code>' book'</code> and <code>' like'</code> keeping extra spaces intact in the split output in case of double, triple... n spaces

You can split on whitespace that is not preceded by white space using look behind (?<=) syntax: <pre class="prettyprint"><code>import re re.split("(?<=\\S) ", words) # ['this', 'is', 'a', ' book', 'and', 'i', ' like', 'it'] </code></pre> Or similarly, use negative look behind: <pre class="prettyprint"><code>re.split("(?<!\\s) ", words) # ['this', 'is', 'a', ' book', 'and', 'i', ' like', 'it'] </code></pre>

Just another regex solution: if you need to split with a single left-most whitespace char, use <code>\s?</code> to match one or zero whitespaces, and then capture 0+ remaining whitespaces and the subsequent non-whitespace chars. One very important step: run <code>rstrip</code> on the input string before running the regex to remove all the trailing whitespace, since otherwise, its performance will decrease greatly. <pre class="prettyprint"><code>import re words = "this is a book and i like it" print(re.findall(r'\s?(\s*\S+)', words.rstrip())) # => ['this', 'is', 'a', ' book', 'and', 'i', ' like', 'it'] </code></pre> See a Python demo. The <code>re.findall</code> returns just the captured substrings and since we only have one capturing group, the result is a list of those captures. Also, here is a regex demo. Details: <ul> <li> <code>\s?</code> - 1 or 0 (due to <code>?</code> quantifier) whitespaces</li> <li> <code>(\s*\S+)</code> - Capturing group #1 matching <ul> <li> <code>\s*</code> - zero or more (due to the <code>*</code> quantifier) whitespace</li> <li> <code>\S+</code> - 1 or more (due to <code>+</code> quantifier) non-whitespace symbols.</li> </ul> </li> </ul>

Python split string exactly on one space. if double space make " word" not "word"

Q: How do you split a string on the basis of space in Python?

Python String split() MethodThe split() method splits a string into a list. You can specify the separator, default separator is any whitespace. Note: When maxsplit is specified, the list will contain the specified number of elements plus one.

Q: How do you change from double space to single space in Python?

Use the re. sub() method to replace multiple spaces with a single space, e.g. result = re. sub(' +', ' ', my_str) .

Tags:

python

string

regex

split

I have the following string.

words = "this is a book and i like it"

What i want is that when i split it by one space i get the following. wordList = words.split(" ") print wordList << ['this','is','a',' book','and','i',' like','it']

Simple words.split(" ") function splits the string but incase of double space it remove both spaces which gives 'book' and 'like'. and what i need is ' book' and ' like' keeping extra spaces intact in the split output in case of double, triple... n spaces

698

asked May 11 '17 12:05

Qaisar Rajput

2 Answers

You can split on whitespace that is not preceded by white space using look behind (?<=) syntax:

import re

re.split("(?<=\\S) ", words)
# ['this', 'is', 'a', ' book', 'and', 'i', ' like', 'it']

Or similarly, use negative look behind:

re.split("(?<!\\s) ", words)
# ['this', 'is', 'a', ' book', 'and', 'i', ' like', 'it']

answered Sep 19 '22 22:09

Psidom

Just another regex solution: if you need to split with a single left-most whitespace char, use \s? to match one or zero whitespaces, and then capture 0+ remaining whitespaces and the subsequent non-whitespace chars.

One very important step: run rstrip on the input string before running the regex to remove all the trailing whitespace, since otherwise, its performance will decrease greatly.

import re
words = "this is a  book and i  like it"
print(re.findall(r'\s?(\s*\S+)', words.rstrip()))
# => ['this', 'is', 'a', ' book', 'and', 'i', ' like', 'it']

See a Python demo. The re.findall returns just the captured substrings and since we only have one capturing group, the result is a list of those captures.

Also, here is a regex demo. Details:

\s? - 1 or 0 (due to ? quantifier) whitespaces
(\s*\S+) - Capturing group #1 matching
- \s* - zero or more (due to the * quantifier) whitespace
- \S+ - 1 or more (due to + quantifier) non-whitespace symbols.

answered Sep 17 '22 22:09

Wiktor Stribiżew

Related questions
                            
                                Efficiently sum items by type
                            
                                Pandas: return number of occurrences by date
                            
                                Python Enum shows weird behavior when using same dictionary for member values
                            
                                Drawing arrowheads which follow the direction of the line in PyGame
                            
                                How to drop the Year-Month-Date from a datetime series in python?
                            
                                Set background color behind the image in matplotlib
                            
                                How to solve pygame‑1.9.3‑cp36‑cp36m‑win32.whl is not a supported wheel on this platform
                            
                                TypeError: float() argument must be a string or a number, not 'method'
                            
                                Pandas - Round date to 30 minutes
                            
                                Python Flask Web API [Heroku]: It runs locally but shows Application Error when deployed
                            
                                Count repeated values in a specific column in a CSV file and return the value to another column (python2)
                            
                                python requests <Response [520]>
                            
                                Python sorting multidimensional dict by a specific column
                            
                                What is the best way in Python to call the same function in separate threads?
                            
                                Python : How to pass file object as function parameter on python script?
                            
                                How to post with categories to Wordpress using WP REST API?
                            
                                remove characters from pandas column
                            
                                How can I find the position of the list of substrings from the string?
                            
                                Gurobi Python: how to write nested sum in a constraint
                            
                                Can't Upload Image In Django

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With