In Python, I have a lot of strings, containing spaces. I would like to clear all spaces from the text, except if it is in quotation marks.
Example input:
This is "an example text" containing spaces.
And I want to get:
Thisis"an example text"containingspaces.
line.split() is not good, I think, because it clears all of spaces from the text.
What do you recommend?
For the simple case that only " are used as quotes:
>>> import re
>>> s = 'This is "an example text" containing spaces.'
>>> re.sub(r' (?=(?:[^"]*"[^"]*")*[^"]*$)', "", s)
'Thisis"an example text"containingspaces.'
Explanation:
[ ]      # Match a space
(?=      # only if an even number of spaces follows --> lookahead
 (?:     # This is true when the following can be matched:
  [^"]*" # Any number of non-quote characters, then a quote, then
  [^"]*" # the same thing again to get an even number of quotes.
 )*      # Repeat zero or more times.
 [^"]*   # Match any remaining non-quote characters
 $       # and then the end of the string.
)        # End of lookahead.
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With