Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python split string exactly on one space. if double space make " word" not "word"

I have the following string.

words = "this is a book and i like it"

What i want is that when i split it by one space i get the following. wordList = words.split(" ") print wordList << ['this','is','a',' book','and','i',' like','it']

Simple words.split(" ") function splits the string but incase of double space it remove both spaces which gives 'book' and 'like'. and what i need is ' book' and ' like' keeping extra spaces intact in the split output in case of double, triple... n spaces

like image 698
Qaisar Rajput Avatar asked May 11 '17 12:05

Qaisar Rajput


People also ask

How do you split a string on the basis of space in Python?

Python String split() MethodThe split() method splits a string into a list. You can specify the separator, default separator is any whitespace. Note: When maxsplit is specified, the list will contain the specified number of elements plus one.

How do you change from double space to single space in Python?

Use the re. sub() method to replace multiple spaces with a single space, e.g. result = re. sub(' +', ' ', my_str) .


2 Answers

You can split on whitespace that is not preceded by white space using look behind (?<=) syntax:

import re

re.split("(?<=\\S) ", words)
# ['this', 'is', 'a', ' book', 'and', 'i', ' like', 'it']

Or similarly, use negative look behind:

re.split("(?<!\\s) ", words)
# ['this', 'is', 'a', ' book', 'and', 'i', ' like', 'it']
like image 52
Psidom Avatar answered Sep 19 '22 22:09

Psidom


Just another regex solution: if you need to split with a single left-most whitespace char, use \s? to match one or zero whitespaces, and then capture 0+ remaining whitespaces and the subsequent non-whitespace chars.

One very important step: run rstrip on the input string before running the regex to remove all the trailing whitespace, since otherwise, its performance will decrease greatly.

import re
words = "this is a  book and i  like it"
print(re.findall(r'\s?(\s*\S+)', words.rstrip()))
# => ['this', 'is', 'a', ' book', 'and', 'i', ' like', 'it']

See a Python demo. The re.findall returns just the captured substrings and since we only have one capturing group, the result is a list of those captures.

Also, here is a regex demo. Details:

  • \s? - 1 or 0 (due to ? quantifier) whitespaces
  • (\s*\S+) - Capturing group #1 matching
    • \s* - zero or more (due to the * quantifier) whitespace
    • \S+ - 1 or more (due to + quantifier) non-whitespace symbols.
like image 26
Wiktor Stribiżew Avatar answered Sep 17 '22 22:09

Wiktor Stribiżew