I have a string like that (it's an empty paragraph) saved from my heavily edited and after-processed input from TinyMCE.
That is how it looks like after echo, in HTML source code in browser:
<p> </p>
Now, I need to remove those empty paragraphs.
I have already tried
$output = str_ireplace("<p> </p>", "", $string);
$output = preg_replace("/<p> <\/p>/", "", $string);
$output = preg_replace("/<p>[ \t\n\r]*<\/p>/", "", $string);
$output = preg_replace("/<p>[\s]*<\/p>/", "", $string);
and many more variations with no luck. It's still there, intact. I have also tried mb_ereg_replace and matching
which isn't apparently the case.
On the other hand, this works:
$output = preg_replace("/<p>.*<\/p>/", "", $string);
but of course striping also paragraphs with actual content.
What else could that "space-like" character be? How am I supposed to match it?
SOLVED Thanks to Ibizaman and this thread link, I've found the character. It is nbsp in unicode value. See http://unicodelookup.com/#160/1
This works:
$output = preg_replace("/<p>[\x{00A0}\s]*<\/p>/u", "", $string);
As pointed by mcrumley, this might work even better:
"/<p>[\p{Zs}\s]*<\/p>/iu"
A whitespace is any character or series of characters that represent horizontal or vertical space. When rendered, a whitespace character does not correspond to a visible mark, but typically does occupy an area on a page. Common whitespace characters include: For more information, see Whitespace character.
That works if only we have one whitespace between wordA and wordB. I need to match what ever the number of whitespaces between wordA & wordB. wordA (10 or more whitespace) wordB -> wordA wordb wordc same wordA (1 whitespace) wordB -> wordA wordb wordc ... Your regex should work 'as-is'. Assuming that it is doing what you want it to.
let whiteSpace = "Whitespace. Whitespace everywhere!" let spaceRegex = /\s/g; whiteSpace.match(spaceRegex); This match call would return [" ", " "]. Change the regex countWhiteSpace to look for multiple whitespace characters in a string. Your regex should use the global flag.
We can also use the String.Split () method to replace any kind of whitespace characters with a single space. The idea is to split the string using a whitespace character as a delimiter and join the non-empty sequences with a single space. The following code example shows how to implement this.
You can use the Unicode character property to match all spaces. \p{Zs}
is "Space separator" and includes space, non-breaking space, thin space, etc. You can also use \pZ
to match all separators, including line separator and paragraph separator. See http://www.php.net/manual/en/regexp.reference.unicode.php for details.
$output = preg_replace("/<p>[\p{Zs}\s]*<\/p>/iu", "", $string);
Since you don't know which character is being outputted, first parse the output of $string
with functions outputting unicode values (see this SO question).
Or, you can proceed the other way around and only accept well-formed paragraphs:
$output = preg_replace("/(<p>[^a-zA-Z0-9]*<\/p>)/", "\1", $string);
Disclaimer : I already put this in comments but since it solved the problem, it's better placed in an answer for future reference, I think.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With