Puzzle: Splitting An HTML String Correctly

Question

I'm trying to split an HTML string by a token in order to create a blog preview without displaying the full post. It's a little harder than I first thought. Here are the problems:

A user will be creating the HTML through a WYSIWYG editor (CKEditor). The markup isn't guaranteed to be pretty or consistent.
The token, read_more(), can be placed anywhere in the string, including being nested within a paragraph tag.
The resulting first split string needs to be valid HTML for all reasonable uses of the token.

Examples of possible uses:

<p>Some text here. read_more()</p>

<p>Some text read more() here.</p>

<p>read_more()</p>

<p>  read_more()</p>

read_more()

So far, I've tried just splitting the string on the token, but it leaves invalid HTML. Regex is perhaps another option. What strategy would you use to solve this and make it as bulletproof as possible? Any code snippets or hints would also be appreciated (I'm using PHP).

mvds · Accepted Answer

function stripmore($in)
{
    list($p1,$p2) = explode("read_more()",$in,2);

    $pass1 = preg_replace("~>[^<>]+<~","><",$p2);
    $pass2 = preg_replace("~^[^<>]+~","",$pass1);

    $pass3 = null;
    while ( $pass3 != $pass2 )
    {
        if ( $pass3 !== null ) $pass2 = $pass3;
        $pass3 = preg_replace("~<([^<>]+)></\1>~","",$pass2);
    }

    return $p1."read_more()".$pass3;
}

this strips any non-html after the read_more() mark, and reduces it to the minimum by stripping corresponding tags, while keeping any tag starting before and ending after the mark:

<p>Some text here. read_more()</p>
      ==> <p>Some text here. read_more()</p>

<p>Some <b>text</b> read_more() <b>here</b>.</p>
      ==> <p>Some <b>text</b> read_more()</p>

<p>Some <b>text read_more() here</b>.</p>
      ==> <p>Some <b>text read_more()</b></p>

Puzzle: Splitting An HTML String Correctly

Tags:

string

regex

php

html-parsing

VirtuosiMedia

1 Answers

mvds

Recent Activity

Donate For Us

Puzzle: Splitting An HTML String Correctly

Tags:

string

regex

php

html-parsing

VirtuosiMedia

1 Answers

mvds

Related questions

Recent Activity

Donate For Us