How to get ID using a specific word in regex?

Tags:

php

My string:

<div class="sect1" id="s9781473910270.i101">       
<div class="sect2" id="s9781473910270.i102">
<h1 class="title">1.2 Summations and Products[label*summation]</h1>
<p>text</p> 
</div>
</div>           
<div class="sect1" id="s9781473910270.i103">
<p>sometext [ref*summation]</p>
</div>

<div class="figure" id="s9781473910270.i220">
<div class="metadata" id="s9781473910270.i221">
</div>
<p>fig1.2 [label*somefigure]</p>
<p>sometext [ref*somefigure]</p>
</div>

Objective: 1.In the string above label*string and ref*string are the cross references. In the place of [ref*string] I need to replace with a with the atributes of class and href, href is the id of div where related label* resides. And class of a is the class of div

As I mentioned above a element class and ID is their relative div class names and ID. But if div class="metadata" exists, need to ignore it should not take their class name and ID.

Expected output:

<div class="sect1" id="s9781473910270.i101">       
<div class="sect2" id="s9781473910270.i102">
<h1 class="title">1.2 Summations and Products[label*summation]</h1>
<p>text</p> 
</div>
</div>             
<div class="sect1" id="s9781473910270.i103">
<p>sometext <a class="section-ref" href="s9781473910270.i102">1.2</a></p>
</div>


<div class="figure" id="s9781473910270.i220">
<div class="metadata" id="s9781473910270.i221">
<p>fig1.2 [label*somefigure]</p>
</div>
<p>sometext <a class="fig-ref" href="s9781473910270.i220">fig 1.2</a></p>          
</div>

How to do it in simpler way without using DOM parser?

My idea is, have to store label* string and their ID in an array and will loop against ref string to match the label* string if string matches then their related id and class should be replaced in the place of ref* string , So I have tried this regex to get label*string and their related id and class name.

441

asked Jun 05 '15 09:06

Learning

1 Answers

This approach consists to use the html structure to retrieve needed elements with DOMXPath. Regex are used in a second time to extract informations from text nodes or attributes:

$classRel = ['sect2'  => 'section-ref',
             'figure' => 'fig-ref'];

libxml_use_internal_errors(true);

$dom = new DOMDocument;
$dom->loadHTML($html); // or $dom->loadHTMLFile($url); 

$xp = new DOMXPath($dom);

// make a custom php function available for the XPath query
// (it isn't really necessary, but it is more rigorous than writing
// "contains(@class, 'myClass')" )
$xp->registerNamespace("php", "http://php.net/xpath");

function hasClass($classNode, $className) {
    if (!empty($classNode))
        return in_array($className, preg_split('~\s+~', $classNode[0]->value, -1, PREG_SPLIT_NO_EMPTY));
    return false;
}

$xp->registerPHPFunctions('hasClass');


// The XPath query will find the first ancestor of a text node with '[label*'
// that is a div tag with an id and a class attribute,
// if the class attribute doesn't contain the "metadata" class.

$labelQuery = <<<'EOD'
//text()[contains(., 'label*')]
/ancestor::div
[@id and @class and not(php:function('hasClass', @class, 'metadata'))][1]
EOD;

$idNodeList = $xp->query($labelQuery);

$links = [];

// For each div node, a new link node is created in the associative array $links.
// The keys are labels. 
foreach($idNodeList as $divNode) {

    // The pattern extract the first text part in group 1 and the label in group 2
    if (preg_match('~(\S+) .*? \[label\* ([^]]+) ]~x', $divNode->textContent, $m)) {
        $links[$m[2]] = $dom->createElement('a');
        $links[$m[2]]->setAttribute('href', $divNode->getAttribute('id'));
        $links[$m[2]]->setAttribute('class', $classRel[$divNode->getAttribute('class')]);
        $links[$m[2]]->nodeValue = $m[1];
    }
}


if ($links) { // if $links is empty no need to do anything

    $refNodeList = $xp->query("//text()[contains(., '[ref*')]");

    foreach ($refNodeList as $refNode) {
        // split the text with square brackets parts, the reference name is preserved in a capture
        $parts = preg_split('~\[ref\*([^]]+)]~', $refNode->nodeValue, -1, PREG_SPLIT_DELIM_CAPTURE);

        // create a fragment to receive text parts and links
        $frag = $dom->createDocumentFragment();

        foreach ($parts as $k=>$part) {
            if ($k%2 && isset($links[$part])) { // delimiters are always odd items
                $clone = $links[$part]->cloneNode(true);
                $frag->appendChild($clone);
            } elseif ($part !== '') {
                $frag->appendChild($dom->createTextNode($part));
            }
        }

        $refNode->parentNode->replaceChild($frag, $refNode);
    }
}

$result = '';

$childNodes = $dom->getElementsByTagName('body')->item(0)->childNodes;

foreach ($childNodes as $childNode) {
    $result .= $dom->saveXML($childNode);
}

echo $result;

117

answered Oct 17 '22 06:10

Casimir et Hippolyte

Related questions
                            
                                Laravel 5 route parameters not send
                            
                                Bootstrap dropdown links not working
                            
                                How to attach different value for additional field in pivot table Laravel 5
                            
                                Get html between comments block Simple HTM DOM
                            
                                Load Symfony2 translations from custom directory
                            
                                file_get_contents 504 gateway timeout after merging project to amazon EC2?
                            
                                How to insert array data to mysql table
                            
                                Laravel 5 Not Finding Eloquent Models
                            
                                .htaccess remove index.php and hide parameter key from URLs
                            
                                How to execute Join queries between multiple Databases that are on different server with Laravel Eloquent?
                            
                                orderBy on whereHas query in Laravel 4.2
                            
                                Windows 10 and XAMPP
                            
                                Is there any way to convert an array to class properties?
                            
                                Why does php artisan migrate nothing?
                            
                                Does nginx fastcgi_pass support variables?
                            
                                Converting string to datetime from jQueryUI datepicker using strtotime
                            
                                how to send a file and an input field using JavaScript and Ajax to send a php script
                            
                                Load a 'php://temp' or 'php://memory' file within a Symfony File object
                            
                                Jquery ajax return current page html in response
                            
                                PHP OOP and MySQLi connection = Fatal error: Call to undefined method mysqli::arrayQuery()

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get ID using a specific word in regex?

Tags:

regex

php

Learning

People also ask

1 Answers

Casimir et Hippolyte

Recent Activity

Donate For Us