Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

crawling a html page using php?

This website lists over 250 courses in one list. I want to get the name of each course and insert that into my mysql database using php. The courses are listed like this:

<td> computer science</td>
<td> media studeies</td>
…

Is there a way to do that in PHP, instead of me having a mad data entry nightmare?

like image 590
getaway Avatar asked Dec 29 '22 06:12

getaway


2 Answers

Regular expressions work well.

$page = // get the page
$page = preg_split("/\n/", $page);
for ($text in $page) {
    $matches = array();
    preg_match("/^<td>(.*)<\/td>$/", $text, $matches);
    // insert $matches[1] into the database
}

See the documentation for preg_match.

like image 50
Peter C Avatar answered Dec 31 '22 11:12

Peter C


How to parse HTML has been asked and answered countless times before. While (for your specific UseCase) Regular Expressions will work, it is - in general - better and more reliable to use a proper parser for this task. Below is how to do it with DOM:

$dom = new DOMDocument;
$dom->loadHTMLFile('http://courses.westminster.ac.uk/CourseList.aspx');
foreach($dom->getElementsByTagName('td') as $title) {
    echo $title->nodeValue;
}

For inserting the data into MySql, you should use the mysqli extension. Examples are plentiful on StackOverflow. so please use the search function.

like image 21
Gordon Avatar answered Dec 31 '22 12:12

Gordon