PHP Scrape Article Excerpt like Readability

Question

I've seen this question, but it doesn't really satisfy what I'm looking for. That question's answers were either: lift from the meta description tag, and the second was generating an excerpt for an article you already have the body from.

What I want to do is actually get the first few sentences of an article, like Readability does. What't the best method for this? HTML Parsing? Here's what I'm currently using, but this is not very reliable.

function guessExcerpt($url) {
    $html = file_get_contents_curl($url);

    $doc = new DOMDocument();
    @$doc->loadHTML($html);

    $metas = $doc->getElementsByTagName('meta');

    for ($i = 0; $i < $metas->length; $i++)
    {
        $meta = $metas->item($i);
        if($meta->getAttribute('name') == 'description')
            $description = $meta->getAttribute('content');

    }

    return $description;
}

function file_get_contents_curl($url) {
    $ch = curl_init();

    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_TIMEOUT, 5);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);

    $data = curl_exec($ch);
    curl_close($ch);

    return $data;
}

Muhammad Abrar · Accepted Answer

Here is a port of Readability in PHP: https://github.com/andreskrey/readability.php. Just try it. The extraction result will be similar to Readability (because it implements Readability's algorithm).

require 'lib/Readability.inc.php';

$html = file_get_contents_curl($url);

$Readability     = new Readability($html, $html_input_charset); // default charset is utf-8
$ReadabilityData = $Readability->getContent();

$title   = $ReadabilityData['title'];
$content = $ReadabilityData['content'];

Then you can use some sentences from $content as the excerpt.

PHP Scrape Article Excerpt like Readability

Tags:

php

web-scraping

Alfo

1 Answers

Muhammad Abrar

Recent Activity

Donate For Us

PHP Scrape Article Excerpt like Readability

Tags:

php

web-scraping

Alfo

1 Answers

Muhammad Abrar

Related questions

Recent Activity

Donate For Us