How to substr html entities properly?

Question

I have like this :

$mytext="that&#039;s really &quot;confusing&quot; and &lt;absolutly&gt; silly";
echo substr($mytext,0,6);

The output in this case will be : that&# instead of that's

What i want is to count html entities as 1 character then substr, because i always end up with breaked html or some obscure characters at the end of text.

Please don't suggest me to html decode it then substr then encode it, i want a clean method :)

Thanks

cletus · Accepted Answer

There are two ways of doing this:

You can decode the HTML entities, substr() and then encode; or
You can use a regular expression.

(1) uses html_entity_decode() and htmlentities():

$s = html_entity_decode($mytext);
$sub = substr($s, 0, 6);
echo htmlentities($sub);

(2) might be something like:

if (preg_match('!^([^&]|&(?:.*?;)){0,5}!s', $mytext, $match)) {
  echo $match[0];
}

What this is saying is: find me up to 5 occurrences of the preceding expression from the beginning of the string. The preceding expression is either:

any character that isn't an ampersand; or
an ampersand, followed by anything up to and including a semi-colon (ie an HTML entity).

This isn't perfect so I would favour (1).

How to substr html entities properly?

Tags:

php

html-entities

Emily

1 Answers

cletus

Recent Activity

Donate For Us

How to substr html entities properly?

Tags:

php

html-entities

Emily

1 Answers

cletus

Related questions

Recent Activity

Donate For Us