Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regex: remove all text within "double-quotes" (multiline included)

Tags:

regex

php

I'm having a hard time removing text within double-quotes, especially those spread over multiple lines:

$file=file_get_contents('test.html');

$replaced = preg_replace('/"(\n.)+?"/m','', $file);

I want to remove ALL text within double-quotes (included). Some of the text within them will be spread over multiple lines.

I read that newlines can be \r\n and \n as well.

like image 929
siliconpi Avatar asked May 20 '11 16:05

siliconpi


2 Answers

Another edit: daalbert's solution is best: a quote followed by one or more non-quotes ending with a quote.

I would make one slight modification if you're parsing HTML: make it 0 or more non-quote characters...so the regex will be:

"[^"]*"

EDIT:

On second thought, here's a better one:

"[\S\s]*?"

This says: "a quote followed by either a non-whitespace character or white-space character any number of times, non-greedily, ending with a quote"

The one below uses capture groups when it isn't necessary...and the use of a wildcard here isn't explicit about showing that wildcard matches everything but the new-line char...so it's more clear to say: "either a non-whitespace char or whitespace char" :) -- not that it makes any difference in the result.


there are many regexes that can solve your problem but here's one:

"(.*?(\s)*?)*?"

this reads as:

find a quote optionally followed by: (any number of characters that are not new-line characters non-greedily, followed by any number of whitespace characters non-greedily), repeated any number of times non-greedily

greedy means it will go to the end of the string and try matching it. if it can't find the match, it goes one from the end and tries to match, and so on. so non-greedy means it will find as little characters as possible to try matching the criteria.

great link on regex: http://www.regular-expressions.info
great link to test regexes: http://regexpal.com/

Remember that your regex may have to change slightly based on what language you're using to search using regex.

like image 24
vinnybad Avatar answered Nov 09 '22 01:11

vinnybad


Try this expression:

"[^"]+"

Also make sure you replace globally (usually with a g flag - my PHP is rusty so check the docs).

like image 85
Andrew Hare Avatar answered Nov 08 '22 23:11

Andrew Hare