Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP limit text string NOT including html tags?

Tags:

php

substr

Here's what's NOT working for me:

<?php
 $string = 'I have a dog and his name is <a href="http://www.jackismydog.com">Jack</a> and I love him very much because he\'s my favorite dog in the whole wide world and nothing could make me not love him, I think.';
 $limited = substr($string, 0, 100).'...';
 echo $string;
?>

I want to limit the VISIBLE text to 100 characters, but using substr() is also including the non-visible text in the limit (<a href="http://www.jackismydog.com"> and </a>) which takes up 41 of those available 100 characters.

Is there a way to limit the text so that the word "Jack" from the link would be included in the limit, but not <a href="http://www.jackismydog.com"> or </a>?

Edit: I want to keep the link in the string, just not count it's length towards the limit..

like image 400
Jamie Carter Avatar asked Jul 01 '10 21:07

Jamie Carter


People also ask

How do I limit the length of a string in PHP?

Approach 2 (Using mb_strimwidth() function): The mb_strimwidth function is used to get truncated string with specified width. It takes as input the string and the required number of characters. The characters after it are appended with an “!!” string. It returns the string with the trimmed length.

How can I truncate a string to the first 20 words in PHP?

use PHP tokenizer function strtok() in a loop.

How do I remove a string in HTML?

The HTML tags can be removed from a given string by using replaceAll() method of String class. We can remove the HTML tags from a given string by using a regular expression. After removing the HTML tags from a string, it will return a string as normal text.


3 Answers

A function to truncate words in HTML code:

//+ Jonas Raoni Soares Silva
//@ http://jsfromhell.com
function truncate($text, $length, $suffix = '&hellip;', $isHTML = true) {
    $i = 0;
    $simpleTags=array('br'=>true,'hr'=>true,'input'=>true,'image'=>true,'link'=>true,'meta'=>true);
    $tags = array();
    if($isHTML){
        preg_match_all('/<[^>]+>([^<]*)/', $text, $m, PREG_OFFSET_CAPTURE | PREG_SET_ORDER);
        foreach($m as $o){
            if($o[0][1] - $i >= $length)
                break;
            $t = substr(strtok($o[0][0], " \t\n\r\0\x0B>"), 1);
            // test if the tag is unpaired, then we mustn't save them
            if($t[0] != '/' && (!isset($simpleTags[$t])))
                $tags[] = $t;
            elseif(end($tags) == substr($t, 1))
                array_pop($tags);
            $i += $o[1][1] - $o[0][1];
        }
    }

    // output without closing tags
    $output = substr($text, 0, $length = min(strlen($text),  $length + $i));
    // closing tags
    $output2 = (count($tags = array_reverse($tags)) ? '</' . implode('></', $tags) . '>' : '');

    // Find last space or HTML tag (solving problem with last space in HTML tag eg. <span class="new">)
    $pos = (int)end(end(preg_split('/<.*>| /', $output, -1, PREG_SPLIT_OFFSET_CAPTURE)));
    // Append closing tags to output
    $output.=$output2;

    // Get everything until last space
    $one = substr($output, 0, $pos);
    // Get the rest
    $two = substr($output, $pos, (strlen($output) - $pos));
    // Extract all tags from the last bit
    preg_match_all('/<(.*?)>/s', $two, $tags);
    // Add suffix if needed
    if (strlen($text) > $length) { $one .= $suffix; }
    // Re-attach tags
    $output = $one . implode($tags[0]);

    //added to remove  unnecessary closure
    $output = str_replace('</!-->','',$output); 

    return $output;
}

Source: http://snippets.dzone.com/posts/show/7125

like image 194
Chris Harrison Avatar answered Nov 05 '22 21:11

Chris Harrison


The easiest way would be to actually parse this into a DOM structure. You could use DOMDocument for that. Then you could simply go through the elements and make any changes to content.

Another approach would be to do a two-pass regex search and replace - first use the regex to find contents of tags, then use the regex to replace the contents with shortened contents. This can be achieved with your usual preg_* functions.

like image 3
Jani Hartikainen Avatar answered Nov 05 '22 22:11

Jani Hartikainen


Not easily - you could of course use strip_tags to de-htmlise the string, but other than that there's no easy fix.

like image 2
John Parker Avatar answered Nov 05 '22 22:11

John Parker