simple HTML code is here.
<table>
<tr><th>Name</th><th>Price</th><th>Country</th></tr>
<tr><td><a href="bbb/111">Apple</a></td><td>500</td><td>America</td></tr>
<tr><td><a href="bbb/222">Samsung</a></td><td>400</td><td>Korea</td></tr>
<tr><td><a href="bbb/333">Nokia</a></td><td>300</td><td>Finland</td></tr>
<tr><td><a href="bbb/444">HTC</a></td><td>200</td><td>Taiwan</td></tr>
<tr><td><a href="bbb/555">Blackberry</a></td><td>100</td><td>America</td></tr>
</table>
What I want to do is scrapping company name, and its price. like this.
Apple 500 / Samsung 400 / Nokia 300 / HTC 200 / Blackberry 100
So, I use php dom parser. I know there are many php parser plugin, but people say it is better to use original php parser. so I code like this.
$source_n = file_get_contents($html);
$dom = new DOMDocument();
@$dom->loadHTML($source_n);
$stacks = $dom->getElementsByTagName('table')->item(0)->textContent;
echo $stacks;
it is will shown many string values.... like this.
Name Price Country Apple 500 America Samsung 400 Korea ......
It is very I think, not useful coding, if I code like above, I should use explode() function, and code will more dirty than now.
How can I scrapping more elegantly? is there any easy reference?
Use DOMXPath::query
, gather all names first
$selector = new DOMXPath($dom);
$results = $selector->query('//td/a');
foreach($results as $node) {
echo $node->nodeValue . PHP_EOL;
}
Then, prices after, by changing
$results = $selector->query('//td[2]');
Sandbox sample here
The best solution I found for parsing html is using symfony's Dom crawler component. Together with the css selector, you can filter HTML like you would select a class in javascript. For example to get all p
elements, do:
$crawler = $crawler->filter('body > p');
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With