Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regex replace text in quotes except quotes themselves

Tags:

python

regex

So I have a test string for example

content = 'I opened my mouth, "Good morning!" I said cheerfully'

I want to use regex to remove text in between double speech marks, but not the speech marks themselves. So it will return

'I opened my mouth, "" I said cheerfully'

I am using the following code

content = re.sub(r'".*"'," ",content)

But this removes the double speech marks aswell. What pattern should I use to keep the speech marks but remove the text inside them.

like image 479
Greg Hornby Avatar asked Nov 30 '22 02:11

Greg Hornby


2 Answers

Use '""' as the replacement string:

>>> content = 'I opened my mouth, "Good morning!" I said cheerfully'
>>> content = re.sub(r'".*"', '""', content)
>>> print(content)
I opened my mouth, "" I said cheerfully

BTW, .* matches as much as possible (greedy). To match non-greedy fashion, use .*? or [^"]*.

>>> content =  'I opened my mouth, "Good morning!" I said cheerfully. "How is everyone?"'
>>> content = re.sub(r'".*?"', '""', content)
>>> print(content)
I opened my mouth, "" I said cheerfully. ""
like image 174
falsetru Avatar answered Dec 05 '22 12:12

falsetru


You could also use lookarounds:

(?<=")([^"]+)(?=")

Regular expression visualization

Debuggex Demo

content = re.sub(r'(?<=")([^"]+)(?=")', '', content)

Two notes:

  • .* will capture everything up to the last double-quote in your string, instead of the next one. This is why I've made it [^"]+.
  • Importantly, this will not work when two doubly-quoted sub-strings are in the overall string, unless you increment the index at which the next search begins. So, for example, with

    I opened my mouth, "Good morning!" I said cheerfully. "How is everyone?"

In order to not capture I said cheerfully., you must increment the index by one after `Good morning!" is found.

like image 31
aliteralmind Avatar answered Dec 05 '22 12:12

aliteralmind