I have like this :
$mytext="that's really "confusing" and <absolutly> silly";
echo substr($mytext,0,6);
The output in this case will be : that&#
instead of that's
What i want is to count html entities as 1 character then substr, because i always end up with breaked html or some obscure characters at the end of text.
Please don't suggest me to html decode it then substr then encode it, i want a clean method :)
Thanks
There are two ways of doing this:
You can decode the HTML entities, substr()
and then encode; or
You can use a regular expression.
(1) uses html_entity_decode()
and htmlentities()
:
$s = html_entity_decode($mytext);
$sub = substr($s, 0, 6);
echo htmlentities($sub);
(2) might be something like:
if (preg_match('!^([^&]|&(?:.*?;)){0,5}!s', $mytext, $match)) {
echo $match[0];
}
What this is saying is: find me up to 5 occurrences of the preceding expression from the beginning of the string. The preceding expression is either:
any character that isn't an ampersand; or
an ampersand, followed by anything up to and including a semi-colon (ie an HTML entity).
This isn't perfect so I would favour (1).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With