Situation is a string that results in something like this: <pre class="prettyprint"><code>This is some text and here is a bold text then the post stop here.... </code></pre> Because the function returns a teaser (summary) of the text, it stops after certain words. Where in this case the tag strong is not closed. But the whole string is wrapped in a paragraph. Is it possible to convert the above result/output to the following: <pre class="prettyprint"><code>This is some text and here is a bold text then the post stop here.... </code></pre> I do not know where to begin. The problem is that.. I found a function on the web which does it regex, but it puts the closing tag after the string.. therefore it won't validate because I want all open/close tags within the paragraph tags. The function I found does this which is wrong also: <pre class="prettyprint"><code>This is some text and here is a bold text then the post stop here.... </code></pre> I want to know that the tag can be strong, italic, anything. That's why I cannot append the function and close it manually in the function. Any pattern that can do it for me?

Here is a function i've used before, which works pretty well: <pre class="prettyprint"><code>function closetags($html) { preg_match_all('#<(?!meta|img|br|hr|input\b)\b([a-z]+)(?: .*)?(?<![/|/ ])>#iU', $html, $result); $openedtags = $result[1]; preg_match_all('#</([a-z]+)>#iU', $html, $result); $closedtags = $result[1]; $len_opened = count($openedtags); if (count($closedtags) == $len_opened) { return $html; } $openedtags = array_reverse($openedtags); for ($i=0; $i < $len_opened; $i++) { if (!in_array($openedtags[$i], $closedtags)) { $html .= '</'.$openedtags[$i].'>'; } else { unset($closedtags[array_search($openedtags[$i], $closedtags)]); } } return $html; } </code></pre> Personally though, I would not do it using regexp but a library such as Tidy. This would be something like the following: <pre class="prettyprint"><code>$str = 'This is some text and here is a bold text then the post stop here....'; $tidy = new Tidy(); $clean = $tidy->repairString($str, array( 'output-xml' => true, 'input-xml' => true )); echo $clean; </code></pre>

A small modification to the original answer...while the original answer stripped tags correctly. I found that during my truncation, I could end up with chopped up tags. For example: <pre class="prettyprint"><code>This text has some in it </code></pre> Truncating at character 21 results in: <pre class="prettyprint"><code>This text has some < </code></pre> The following code, builds on the next best answer and fixes this. <pre class="prettyprint"><code>function truncateHTML($html, $length) { $truncatedText = substr($html, $length); $pos = strpos($truncatedText, ">"); if($pos !== false) { $html = substr($html, 0,$length + $pos + 1); } else { $html = substr($html, 0,$length); } preg_match_all('#<(?!meta|img|br|hr|input\b)\b([a-z]+)(?: .*)?(?<![/|/ ])>#iU', $html, $result); $openedtags = $result[1]; preg_match_all('#</([a-z]+)>#iU', $html, $result); $closedtags = $result[1]; $len_opened = count($openedtags); if (count($closedtags) == $len_opened) { return $html; } $openedtags = array_reverse($openedtags); for ($i=0; $i < $len_opened; $i++) { if (!in_array($openedtags[$i], $closedtags)) { $html .= '</'.$openedtags[$i].'>'; } else { unset($closedtags[array_search($openedtags[$i], $closedtags)]); } } return $html; } $str = "This text has bold in it"; print "Test 1 - Truncate with no tag: " . truncateHTML($str, 5) . " \n"; print "Test 2 - Truncate at start of tag: " . truncateHTML($str, 20) . " \n"; print "Test 3 - Truncate in the middle of a tag: " . truncateHTML($str, 16) . " \n"; print "Test 4: - Truncate with less text: " . truncateHTML($str, 300) . " \n"; </code></pre> Hope it helps someone out there.

Close open HTML tags in a string

Q: How do I close an unclosed tag in HTML?

<? php // close opened html tags function closetags ( $html ) { #put all opened tags into an array preg_match_all ( "#<([a-z]+)( .

Q: How do I close HTML tags in PHP?

The strip_tags() function strips a string from HTML, XML, and PHP tags. Note: HTML comments are always stripped. This cannot be changed with the allow parameter. Note: This function is binary-safe.

Q: How do you close a tag in Javascript?

Also to close an any element tag you must put the slash in front of the elements name. </script>, </html> etc.

Tags:

string

regex

php

Situation is a string that results in something like this:

<p>This is some text and here is a <strong>bold text then the post stop here....</p>

Because the function returns a teaser (summary) of the text, it stops after certain words. Where in this case the tag strong is not closed. But the whole string is wrapped in a paragraph.

Is it possible to convert the above result/output to the following:

<p>This is some text and here is a <strong>bold text then the post stop here....</strong></p>

I do not know where to begin. The problem is that.. I found a function on the web which does it regex, but it puts the closing tag after the string.. therefore it won't validate because I want all open/close tags within the paragraph tags. The function I found does this which is wrong also:

<p>This is some text and here is a <strong>bold text then the post stop here....</p></strong>

I want to know that the tag can be strong, italic, anything. That's why I cannot append the function and close it manually in the function. Any pattern that can do it for me?

297

asked Sep 28 '10 06:09

Ahmad Fouad

2 Answers

Here is a function i've used before, which works pretty well:

function closetags($html) {
    preg_match_all('#<(?!meta|img|br|hr|input\b)\b([a-z]+)(?: .*)?(?<![/|/ ])>#iU', $html, $result);
    $openedtags = $result[1];
    preg_match_all('#</([a-z]+)>#iU', $html, $result);
    $closedtags = $result[1];
    $len_opened = count($openedtags);
    if (count($closedtags) == $len_opened) {
        return $html;
    }
    $openedtags = array_reverse($openedtags);
    for ($i=0; $i < $len_opened; $i++) {
        if (!in_array($openedtags[$i], $closedtags)) {
            $html .= '</'.$openedtags[$i].'>';
        } else {
            unset($closedtags[array_search($openedtags[$i], $closedtags)]);
        }
    }
    return $html;
}

Personally though, I would not do it using regexp but a library such as Tidy. This would be something like the following:

$str = '<p>This is some text and here is a <strong>bold text then the post stop here....</p>';
$tidy = new Tidy();
$clean = $tidy->repairString($str, array(
    'output-xml' => true,
    'input-xml' => true
));
echo $clean;

173

answered Oct 25 '22 08:10

alexn

A small modification to the original answer...while the original answer stripped tags correctly. I found that during my truncation, I could end up with chopped up tags. For example:

This text has some <b>in it</b>

Truncating at character 21 results in:

This text has some <

The following code, builds on the next best answer and fixes this.

function truncateHTML($html, $length)
{
    $truncatedText = substr($html, $length);
    $pos = strpos($truncatedText, ">");
    if($pos !== false)
    {
        $html = substr($html, 0,$length + $pos + 1);
    }
    else
    {
        $html = substr($html, 0,$length);
    }

    preg_match_all('#<(?!meta|img|br|hr|input\b)\b([a-z]+)(?: .*)?(?<![/|/ ])>#iU', $html, $result);
    $openedtags = $result[1];

    preg_match_all('#</([a-z]+)>#iU', $html, $result);
    $closedtags = $result[1];

    $len_opened = count($openedtags);

    if (count($closedtags) == $len_opened)
    {
        return $html;
    }

    $openedtags = array_reverse($openedtags);
    for ($i=0; $i < $len_opened; $i++)
    {
        if (!in_array($openedtags[$i], $closedtags))
        {
            $html .= '</'.$openedtags[$i].'>';
        }
        else
        {
            unset($closedtags[array_search($openedtags[$i], $closedtags)]);
        }
    }


    return $html;
}


$str = "This text has <b>bold</b> in it</b>";
print "Test 1 - Truncate with no tag: " . truncateHTML($str, 5) . "<br>\n";
print "Test 2 - Truncate at start of tag: " . truncateHTML($str, 20) . "<br>\n";
print "Test 3 - Truncate in the middle of a tag: " . truncateHTML($str, 16) . "<br>\n";
print "Test 4: - Truncate with less text: " . truncateHTML($str, 300) . "<br>\n";

Hope it helps someone out there.

answered Oct 25 '22 10:10

Markus

Related questions
                            
                                How to generate a matrix of combinations
                            
                                <pre> tag and css font-family
                            
                                PHP state machine framework
                            
                                List<long> to comma delimited string in C#
                            
                                How to get detailed list of connections to database in sql server 2005?
                            
                                How do I float a div to the center?
                            
                                Disable clipboard prompt in Excel VBA on workbook close
                            
                                Log4net not inserting into the database?
                            
                                split string at index
                            
                                Why does MSMQ think I'm on a workgroup computer?
                            
                                Adding libxml2 in XCode
                            
                                C++ convert integer to string at compile time

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With