So I have a test string for example
content = 'I opened my mouth, "Good morning!" I said cheerfully'
I want to use regex to remove text in between double speech marks, but not the speech marks themselves. So it will return
'I opened my mouth, "" I said cheerfully'
I am using the following code
content = re.sub(r'".*"'," ",content)
But this removes the double speech marks aswell. What pattern should I use to keep the speech marks but remove the text inside them.
Use '""'
as the replacement string:
>>> content = 'I opened my mouth, "Good morning!" I said cheerfully'
>>> content = re.sub(r'".*"', '""', content)
>>> print(content)
I opened my mouth, "" I said cheerfully
BTW, .*
matches as much as possible (greedy). To match non-greedy fashion, use .*?
or [^"]*
.
>>> content = 'I opened my mouth, "Good morning!" I said cheerfully. "How is everyone?"'
>>> content = re.sub(r'".*?"', '""', content)
>>> print(content)
I opened my mouth, "" I said cheerfully. ""
You could also use lookarounds:
(?<=")([^"]+)(?=")
Debuggex Demo
content = re.sub(r'(?<=")([^"]+)(?=")', '', content)
Two notes:
.*
will capture everything up to the last double-quote in your string, instead of the next one. This is why I've made it [^"]+
.Importantly, this will not work when two doubly-quoted sub-strings are in the overall string, unless you increment the index at which the next search begins. So, for example, with
I opened my mouth, "Good morning!" I said cheerfully. "How is everyone?"
In order to not capture I said cheerfully.
, you must increment the index by one after `Good morning!" is found.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With