I am trying to parse the table shown here into a multi-dimensional php array. I am using the following code but for some reason its returning an empty array. After searching around on the web, I found this site which is where I got the parseTable() function from. From reading the comments on that website, I see that the function works perfectly. So I'm assuming there is something wrong with the way I'm getting the HTML code from file_get_contents(). Any thoughts on what I'm doing wrong?
<?php
$data = file_get_contents('http://flow935.com/playlist/flowhis.HTM');
function parseTable($html)
{
// Find the table
preg_match("/<table.*?>.*?<\/[\s]*table>/s", $html, $table_html);
// Get title for each row
preg_match_all("/<th.*?>(.*?)<\/[\s]*th>/", $table_html[0], $matches);
$row_headers = $matches[1];
// Iterate each row
preg_match_all("/<tr.*?>(.*?)<\/[\s]*tr>/s", $table_html[0], $matches);
$table = array();
foreach($matches[1] as $row_html)
{
preg_match_all("/<td.*?>(.*?)<\/[\s]*td>/", $row_html, $td_matches);
$row = array();
for($i=0; $i<count($td_matches[1]); $i++)
{
$td = strip_tags(html_entity_decode($td_matches[1][$i]));
$row[$row_headers[$i]] = $td;
}
if(count($row) > 0)
$table[] = $row;
}
return $table;
}
$output = parseTable($data);
print_r($output);
?>
I want my output array to look something like this:
1 --> 11:33AM --> DEV --> IN THE DARK 2 --> 11:29AM --> LIL' WAYNE --> SHE WILL 3 --> 11:26AM --> KARDINAL OFFISHALL --> NUMBA 1 (TIDE IS HIGH)
php $htmlContent = file_get_contents("http://teskusman.esy.es/index.html"); $DOM = new DOMDocument(); $DOM->loadHTML($htmlContent); $Header = $DOM->getElementsByTagName('th'); $Detail = $DOM->getElementsByTagName('td'); //#Get header name of the table foreach($Header as $NodeHeader) { $aDataTableHeaderHTML[] = trim($ ...
The file_get_contents() reads a file into a string. This function is the preferred way to read the contents of a file into a string. It will use memory mapping techniques, if this is supported by the server, to enhance performance.
The function returns the read data or false on failure. This function may return Boolean false , but may also return a non-Boolean value which evaluates to false . Please read the section on Booleans for more information.
Don't cripple yourself parsing HTML with regexps! Instead, let an HTML parser library worry about the structure of the markup for you.
I suggest you to check out Simple HTML DOM (http://simplehtmldom.sourceforge.net/). It is a library specifically written to aid in solving this kind of web scraping problems in PHP. By using such a library, you can write your scraping in much less lines of codes without worrying about creating working regexps.
In principle, with Simple HTML DOM you just write something like:
$html = file_get_html('http://flow935.com/playlist/flowhis.HTM');
foreach($html->find('tr') as $row) {
// Parse table row here
}
This can be then extended to capture your data in some format, for instance to create an array of artists and corresponding titles as:
<?php
require('simple_html_dom.php');
$table = array();
$html = file_get_html('http://flow935.com/playlist/flowhis.HTM');
foreach($html->find('tr') as $row) {
$time = $row->find('td',0)->plaintext;
$artist = $row->find('td',1)->plaintext;
$title = $row->find('td',2)->plaintext;
$table[$artist][$title] = true;
}
echo '<pre>';
print_r($table);
echo '</pre>';
?>
We can see that this code can be (trivially) changed to reformat the data in any other way as well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With