Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Truncate string with HTML tags in it

I have a string which contains HTML tags. I'm looking for a piece of code that would let me truncate this string to:

  • have 100 characters length,
  • contain no image tags (<img />).
  • include other HTML tags (except image tag),
  • that 100 characters lenght should not include white spaces and HTML tags characters.

For example, the string is:

<img>Something</img><b>Just an Example</b> Plain Text <br><a href="#">stackoverflow</a>

So the result should be:

Just an Example Plain Text stackoverflow (its a link).

As a result we have around 35 words (except white-space).

I tried solution from this question, but didn't get required result. Any help would be appreciated.

like image 986
Arpit Rawat Avatar asked Dec 14 '11 12:12

Arpit Rawat


People also ask

How do I truncate a string in HTML?

text_truncate = function(str, length, ending) { if (length == null) { length = 100; } if (ending == null) { ending = '...'; } if (str. length > length) { return str. substring(0, length - ending. length) + ending; } else { return str; } }; console.

How do you remove tag strings?

The HTML tags can be removed from a given string by using replaceAll() method of String class. We can remove the HTML tags from a given string by using a regular expression. After removing the HTML tags from a string, it will return a string as normal text.

How do you remove tags in HTML?

For HTML tags, you can press Alt+Enter and select Remove tag instead of removing an opening tag and then a closing tag.

How do I truncate a string in AC?

To truncate a string in C, you can simply insert a terminating null character in the desired position. All of the standard functions will then treat the string as having the new length. Save this answer.


1 Answers

How about a function. Here's mine -- AbstractHTMLContents. It has two parameters:

  • input HTML content,
  • limit.

Here's the code:

function AbstractHTMLContents($html, $maxLength=100){
    mb_internal_encoding("UTF-8");
    $printedLength = 0;
    $position = 0;
    $tags = array();
    $newContent = '';

    $html = $content = preg_replace("/<img[^>]+\>/i", "", $html);

    while ($printedLength < $maxLength && preg_match('{</?([a-z]+)[^>]*>|&#?[a-zA-Z0-9]+;}', $html, $match, PREG_OFFSET_CAPTURE, $position))
    {
        list($tag, $tagPosition) = $match[0];
        // Print text leading up to the tag.
        $str = mb_strcut($html, $position, $tagPosition - $position);
        if ($printedLength + mb_strlen($str) > $maxLength){
            $newstr = mb_strcut($str, 0, $maxLength - $printedLength);
            $newstr = preg_replace('~\s+\S+$~', '', $newstr);  
            $newContent .= $newstr;
            $printedLength = $maxLength;
            break;
        }
        $newContent .= $str;
        $printedLength += mb_strlen($str);
        if ($tag[0] == '&') {
            // Handle the entity.
            $newContent .= $tag;
            $printedLength++;
        } else {
            // Handle the tag.
            $tagName = $match[1][0];
            if ($tag[1] == '/') {
              // This is a closing tag.
              $openingTag = array_pop($tags);
              assert($openingTag == $tagName); // check that tags are properly nested.
              $newContent .= $tag;
            } else if ($tag[mb_strlen($tag) - 2] == '/'){
          // Self-closing tag.
            $newContent .= $tag;
        } else {
          // Opening tag.
          $newContent .= $tag;
          $tags[] = $tagName;
        }
      }

      // Continue after the tag.
      $position = $tagPosition + mb_strlen($tag);
    }

    // Print any remaining text.
    if ($printedLength < $maxLength && $position < mb_strlen($html))
      {
        $newstr = mb_strcut($html, $position, $maxLength - $printedLength);
        $newstr = preg_replace('~\s+\S+$~', '', $newstr);
        $newContent .= $newstr;
      }

    // Close any open tags.
    while (!empty($tags))
      {
        $newContent .= sprintf('</%s>', array_pop($tags));
      }

    return $newContent;
}

It seems, it gives result expected by you.

like image 133
Akshaya K Sahu Avatar answered Oct 05 '22 04:10

Akshaya K Sahu