Truncate html text while taking in consideration "full stops" (in CachePHP TextHelper-truncate)

Question

Edit:

I ended up using CakePHP's truncate() function. It's much faster and supports unicode :D

But the question still remains:

How can I make the function auto-detect full stops (.) and cut it just after that? So basically $length would be semi-ignored. So if the new text would have a incomplete sentence, more words would appended until the sentence finishes (Or removed, depending on the length of the string from the cut-off until next/previous sentence)

Edit 2: I found out how to detect full stops. I replaced:

 if (!$exact) {
   $spacepos = mb_strrpos($truncate, ' ');

 ...

with

 if (!$exact) {
    $spacepos = mb_strrpos($truncate, '.');
 ...

edit - problem:

When I have tags like img that have dots inside their attributes, the text gets cutoff inside the tag:

 $text = '<p>Abc def abc def abc def abc def. Abc def <img src="test.jpg" /></p><p>abc def abc def abc def abc def.</p>';

 echo htmlentities(truncate($text));

How can I fix that? I'll open a bounty because the original question was already answered...

Alex · Accepted Answer

This snippet solves what you're looking for, and lists it's failures (full stops may not indicate sentence ends, and other punctuation can end sentences).

It will scan characters up to $maxLen and then effectively 'throw away' a partial sentence after the last full stop it finds.

In your case, you'd use this function just before you return $new_text.

Justin ᚅᚔᚈᚄᚒᚔ · Answer

To resolve the "full-stop in tag" issue, you can use something similar to the following to detect if the stop is within a tag:

$str_len       = strlen($summary);
$pos_stop      = strrpos($summary, '.');
$pos_tag_open  = strrpos($summary, '<', -($str_len - $pos_stop));
$pos_tag_close = strpos($summary, '>', $pos_tag_open);

if (($pos_tag_open < $pos_stop) && ($pos_stop < $pos_tag_close)) {
  // Inside tag! Search for the next nearest prior full-stop.
  $pos_stop = strrpos($summary, '.', -($str_len - $pos_tag_open));
}

echo htmlentities(substr($summary, 0, $pos_stop + 1));

Obviously, this code can be optimized (and pulled out into its own function), but you get the idea. I have a feeling there is a regex that might handle this a bit more efficiently.

Edit:

Indeed, there is a regex that can do this, using negative lookahead:

$text = '<p>Abc def abc def abc def abc def. Abc def <img src="test.jpg" />abc</p>';

$count = preg_match_all("/\.(?!([^<]+)?>)/", $text, $arr, PREG_OFFSET_CAPTURE);
$offset = $arr[0][$count-1][1];

echo substr($text, 0, $offset + 1)."
";

This should be relatively efficient, at least in comparison with truncate() which also uses preg_match internally.

Truncate html text while taking in consideration "full stops" (in CachePHP TextHelper->truncate)

Tags:

arrays

string

php

truncate

cakephp

Alex

2 Answers

Alex

Justin ᚅᚔᚈᚄᚒᚔ

Recent Activity

Donate For Us

Truncate html text while taking in consideration "full stops" (in CachePHP TextHelper->truncate)

Tags:

arrays

string

php

truncate

cakephp

Alex

2 Answers

Alex

Justin ᚅᚔᚈᚄᚒᚔ

Related questions

Recent Activity

Donate For Us