Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Truncate html text while taking in consideration "full stops" (in CachePHP TextHelper->truncate)

Edit:

I ended up using CakePHP's truncate() function. It's much faster and supports unicode :D

But the question still remains:

How can I make the function auto-detect full stops (.) and cut it just after that? So basically $length would be semi-ignored. So if the new text would have a incomplete sentence, more words would appended until the sentence finishes (Or removed, depending on the length of the string from the cut-off until next/previous sentence)

Edit 2: I found out how to detect full stops. I replaced:

 if (!$exact) {
   $spacepos = mb_strrpos($truncate, ' ');

 ...

with

 if (!$exact) {
    $spacepos = mb_strrpos($truncate, '.');
 ...

edit - problem:

When I have tags like img that have dots inside their attributes, the text gets cutoff inside the tag:

 $text = '<p>Abc def abc def abc def abc def. Abc def <img src="test.jpg" /></p><p>abc def abc def abc def abc def.</p>';

 echo htmlentities(truncate($text));

How can I fix that? I'll open a bounty because the original question was already answered...

like image 366
Alex Avatar asked Feb 23 '23 23:02

Alex


2 Answers

This snippet solves what you're looking for, and lists it's failures (full stops may not indicate sentence ends, and other punctuation can end sentences).

It will scan characters up to $maxLen and then effectively 'throw away' a partial sentence after the last full stop it finds.

In your case, you'd use this function just before you return $new_text.

like image 89
Alex Avatar answered Feb 25 '23 13:02

Alex


To resolve the "full-stop in tag" issue, you can use something similar to the following to detect if the stop is within a tag:

$str_len       = strlen($summary);
$pos_stop      = strrpos($summary, '.');
$pos_tag_open  = strrpos($summary, '<', -($str_len - $pos_stop));
$pos_tag_close = strpos($summary, '>', $pos_tag_open);

if (($pos_tag_open < $pos_stop) && ($pos_stop < $pos_tag_close)) {
  // Inside tag! Search for the next nearest prior full-stop.
  $pos_stop = strrpos($summary, '.', -($str_len - $pos_tag_open));
}

echo htmlentities(substr($summary, 0, $pos_stop + 1));

Obviously, this code can be optimized (and pulled out into its own function), but you get the idea. I have a feeling there is a regex that might handle this a bit more efficiently.

Edit:

Indeed, there is a regex that can do this, using negative lookahead:

$text = '<p>Abc def abc def abc def abc def. Abc def <img src="test.jpg" />abc</p>';

$count = preg_match_all("/\.(?!([^<]+)?>)/", $text, $arr, PREG_OFFSET_CAPTURE);
$offset = $arr[0][$count-1][1];

echo substr($text, 0, $offset + 1)."\n";

This should be relatively efficient, at least in comparison with truncate() which also uses preg_match internally.

like image 20
Justin ᚅᚔᚈᚄᚒᚔ Avatar answered Feb 25 '23 11:02

Justin ᚅᚔᚈᚄᚒᚔ