Edit:
I ended up using CakePHP's truncate()
function. It's much faster and supports unicode :D
But the question still remains:
How can I make the function auto-detect full stops (.
) and cut it just after that? So basically $length
would be semi-ignored. So if the new text would have a incomplete sentence, more words would appended until the sentence finishes (Or removed, depending on the length of the string from the cut-off until next/previous sentence)
Edit 2: I found out how to detect full stops. I replaced:
if (!$exact) {
$spacepos = mb_strrpos($truncate, ' ');
...
with
if (!$exact) {
$spacepos = mb_strrpos($truncate, '.');
...
edit - problem:
When I have tags like img
that have dots inside their attributes, the text gets cutoff inside the tag:
$text = '<p>Abc def abc def abc def abc def. Abc def <img src="test.jpg" /></p><p>abc def abc def abc def abc def.</p>';
echo htmlentities(truncate($text));
How can I fix that? I'll open a bounty because the original question was already answered...
This snippet solves what you're looking for, and lists it's failures (full stops may not indicate sentence ends, and other punctuation can end sentences).
It will scan characters up to $maxLen
and then effectively 'throw away' a partial sentence after the last full stop it finds.
In your case, you'd use this function just before you return $new_text
.
To resolve the "full-stop in tag" issue, you can use something similar to the following to detect if the stop is within a tag:
$str_len = strlen($summary);
$pos_stop = strrpos($summary, '.');
$pos_tag_open = strrpos($summary, '<', -($str_len - $pos_stop));
$pos_tag_close = strpos($summary, '>', $pos_tag_open);
if (($pos_tag_open < $pos_stop) && ($pos_stop < $pos_tag_close)) {
// Inside tag! Search for the next nearest prior full-stop.
$pos_stop = strrpos($summary, '.', -($str_len - $pos_tag_open));
}
echo htmlentities(substr($summary, 0, $pos_stop + 1));
Obviously, this code can be optimized (and pulled out into its own function), but you get the idea. I have a feeling there is a regex that might handle this a bit more efficiently.
Edit:
Indeed, there is a regex that can do this, using negative lookahead:
$text = '<p>Abc def abc def abc def abc def. Abc def <img src="test.jpg" />abc</p>';
$count = preg_match_all("/\.(?!([^<]+)?>)/", $text, $arr, PREG_OFFSET_CAPTURE);
$offset = $arr[0][$count-1][1];
echo substr($text, 0, $offset + 1)."\n";
This should be relatively efficient, at least in comparison with truncate()
which also uses preg_match internally.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With