I have a huge html code to scan. Until now i have been using preg_match_all
to extract desired parts from it. The problem from the start was that it was extremely cpu time consuming. We finally decided to use some other method for extraction. I read in some articles that preg_match
can be compared in performance with strpos
. They claim that strpos
beats regex scanner up to 20 times in efficiency. I thought i will try this method but i dont really know how to get started.
Lets say i have this html string:
<li id="ncc-nba-16451" class="che10"><a href="/en/star">23 - Star</a></li>
<li id="ncd-bbt-5674" class="che10"><a href="/en/moon">54 - Moon</a></li>
<li id="ertw-cxda-c6543" class="che10"><a href="/en/sun">34,780 - Sun</a></li>
I want to extract only number from each id and only text (letters) from content of a
tags. so i do this preg_match_all
scan:
'/<li.*?id=".*?([\d]+)".*?<a.*?>.*?([\w]+)<\/a>/s'
here you can see the result: LINK
Now if i would want to replace my method to strpos
functionality how the approach would look like? I understand that strpos
returns a index of start where match took place. But how can i use it to:
Thank you for all the help and tips ;)
Using DOM
$html = '
<html>
<head></head>
<body>
<li id="ncc-nba-16451" class="che10"><a href="/en/star">23 - Star</a></li>
<li id="ncd-bbt-5674" class="che10"><a href="/en/moon">54 - Moon</a></li>
<li id="ertw-cxda-c6543" class="che10"><a href="/en/sun">34,780 - Sun</a></li>
</body>
</html>';
$dom_document = new DOMDocument();
$dom_document->loadHTML($html);
$rootElement = $dom_document->documentElement;
$getId = $rootElement->getElementsByTagName('li');
$res = [];
foreach($getId as $tag)
{
$data = explode('-',$tag->getAttribute('id'));
$res['li_id'][] = end($data);
}
$getNode = $rootElement->getElementsByTagName('a');
foreach($getNode as $tag)
{
$res['a_node'][] = $tag->parentNode->textContent;
}
print_r($res);
Output :
Array
(
[li_id] => Array
(
[0] => 16451
[1] => 5674
[2] => c6543
)
[a_node] => Array
(
[0] => 23 - Star
[1] => 54 - Moon
[2] => 34,780 - Sun
)
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With