I know we can use PHP DOM to parse HTML using PHP. I found a lot of questions here on Stack Overflow too. But I have a specific requirement. I have an HTML content like below
<p class="Heading1-P">
<span class="Heading1-H">Chapter 1</span>
</p>
<p class="Normal-P">
<span class="Normal-H">This is chapter 1</span>
</p>
<p class="Heading1-P">
<span class="Heading1-H">Chapter 2</span>
</p>
<p class="Normal-P">
<span class="Normal-H">This is chapter 2</span>
</p>
<p class="Heading1-P">
<span class="Heading1-H">Chapter 3</span>
</p>
<p class="Normal-P">
<span class="Normal-H">This is chapter 3</span>
</p>
I want to parse the above HTML and save the content into two different arrays like:
$heading
and $content
$heading = array('Chapter 1','Chapter 2','Chapter 3');
$content = array('This is chapter 1','This is chapter 2','This is chapter 3');
I can achieve this simply using jQuery. But I am not sure, if that's the right way. It would be great if someone can point me to the right direction. Thanks in advance.
Make a PHP file to read HTML content from a text filetxt' file in read mode and then use fread() function to display file content. You may also like read and delete file from folder using PHP. That's all, this is how to read HTML content from text file using PHP.
HTML parsing involves tokenization and tree construction. HTML tokens include start and end tags, as well as attribute names and values. If the document is well-formed, parsing it is straightforward and faster. The parser parses tokenized input into the document, building up the document tree.
Definition and Usage. The parse_str() function parses a query string into variables. Note: If the array parameter is not set, variables set by this function will overwrite existing variables of the same name. Note: The magic_quotes_gpc setting in the php.
The best performers are Golang and C with very similar results. Python LIBXML2 performs fairly well. Ruby speed is similar to Python.
I have used domdocument and domxpath to get the solution, you can find it at:
<?php
$dom = new DomDocument();
$test='<p class="Heading1-P">
<span class="Heading1-H">Chapter 1</span>
</p>
<p class="Normal-P">
<span class="Normal-H">This is chapter 1</span>
</p>
<p class="Heading1-P">
<span class="Heading1-H">Chapter 2</span>
</p>
<p class="Normal-P">
<span class="Normal-H">This is chapter 2</span>
</p>
<p class="Heading1-P">
<span class="Heading1-H">Chapter 3</span>
</p>
<p class="Normal-P">
<span class="Normal-H">This is chapter 3</span>
</p>';
$dom->loadHTML($test);
$xpath = new DOMXpath($dom);
$heading=parseToArray($xpath,'Heading1-H');
$content=parseToArray($xpath,'Normal-H');
var_dump($heading);
echo "<br/>";
var_dump($content);
echo "<br/>";
function parseToArray($xpath,$class)
{
$xpathquery="//span[@class='".$class."']";
$elements = $xpath->query($xpathquery);
if (!is_null($elements)) {
$resultarray=array();
foreach ($elements as $element) {
$nodes = $element->childNodes;
foreach ($nodes as $node) {
$resultarray[] = $node->nodeValue;
}
}
return $resultarray;
}
}
Live result: http://saji89.codepad.org/2TyOAibZ
Try to look at PHP Simple HTML DOM Parser
It has brilliant syntax similar to jQuery so you can easily select any element you want by ID or class
// include/require the simple html dom parser file
$html_string = '
<p class="Heading1-P">
<span class="Heading1-H">Chapter 1</span>
</p>
<p class="Normal-P">
<span class="Normal-H">This is chapter 1</span>
</p>
<p class="Heading1-P">
<span class="Heading1-H">Chapter 2</span>
</p>
<p class="Normal-P">
<span class="Normal-H">This is chapter 2</span>
</p>
<p class="Heading1-P">
<span class="Heading1-H">Chapter 3</span>
</p>
<p class="Normal-P">
<span class="Normal-H">This is chapter 3</span>
</p>';
$html = str_get_html($html_string);
foreach($html->find('span') as $element) {
if ($element->class === 'Heading1-H') {
$heading[] = $element->innertext;
}else if($element->class === 'Normal-H') {
$content[] = $element->innertext;
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With