Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get Open Graph Protocol of a webpage by php?

Tags:

html

regex

php

PHP has a simple command to get meta tags of a webpage (get_meta_tags), but this only works for meta tags with name attributes. However, Open Graph Protocol is becoming more and more popular these days. What is the easiest way to get the values of opg from a webpage. For example:

<meta property="og:url" content=""> 
<meta property="og:title" content=""> 
<meta property="og:description" content=""> 
<meta property="og:type" content=""> 

The basic way I see is to get the page via cURL and parse it with regex. Any idea?

like image 596
Googlebot Avatar asked Sep 17 '11 12:09

Googlebot


People also ask

How do I add Open Graph protocol to my website?

If you are using WordPress, you can use the WordPress for SEO Plugin by Yoast to add open graph protocol meta tags on your website. Once you have added the plugin, follow the steps below: Go to plugin settings (SEO > Social) Click on the checkbox 'Add Open Graph Meta Data'

How do you know if a website is open graphed?

Using an Open Graph Checker You can go to smallseotools.com and use the OG checker tool. Once you navigate to the website; enter the URL of the website you want the meta og checker tool to check. It will run a check and if it finds the tags it will display them.

What is Open Graph in website?

Open Graph is an internet protocol that was originally created by Facebook to standardize the use of metadata within a webpage to represent the content of a page. Within it, you can provide details as simple as the title of a page or as specific as the duration of a video.


3 Answers

Really simple and well done:

Using https://github.com/scottmac/opengraph

$graph = OpenGraph::fetch('http://www.avessotv.com.br/bastidores-pantene-institute-experience-pg.html'); print_r($graph); 

Will return

OpenGraph Object

(     [_values:OpenGraph:private] => Array         (             [type] => article             [video] => http://www.avessotv.com.br/player/flowplayer/flowplayer-3.2.7.swf?config=%7B%27clip%27%3A%7B%27url%27%3A%27http%3A%2F%2Fwww.avessotv.com.br%2Fmedia%2Fprogramas%2Fpantene.flv%27%7D%7D             [image] => /wp-content/thumbnails/9025.jpg             [site_name] => Programa Avesso - Bastidores             [title] => Bastidores “Pantene Institute Experience†P&G             [url] => http://www.avessotv.com.br/bastidores-pantene-institute-experience-pg.html             [description] => Confira os bastidores do Pantene Institute Experience, da Procter &#038; Gamble. www.pantene.com.br Mais imagens:         )      [_position:OpenGraph:private] => 0 ) 
like image 88
Guilherme Viebig Avatar answered Sep 16 '22 13:09

Guilherme Viebig


When parsing data from HTML, you really shouldn't use regex. Take a look at the DOMXPath Query function.

Now, the actual code could be :

[EDIT] A better query for XPath was given by Stefan Gehrig, so the code can be shortened to :

libxml_use_internal_errors(true); // Yeah if you are so worried about using @ with warnings
$doc = new DomDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$query = '//*/meta[starts-with(@property, \'og:\')]';
$metas = $xpath->query($query);
$rmetas = array();
foreach ($metas as $meta) {
    $property = $meta->getAttribute('property');
    $content = $meta->getAttribute('content');
    $rmetas[$property] = $content;
}
var_dump($rmetas);

Instead of :

$doc = new DomDocument();
@$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$query = '//*/meta';
$metas = $xpath->query($query);
$rmetas = array();
foreach ($metas as $meta) {
    $property = $meta->getAttribute('property');
    $content = $meta->getAttribute('content');
    if(!empty($property) && preg_match('#^og:#', $property)) {
        $rmetas[$property] = $content;
    }
}
var_dump($rmetas);
like image 37
Tom Avatar answered Sep 20 '22 13:09

Tom


How about:

preg_match_all('~<\s*meta\s+property="(og:[^"]+)"\s+content="([^"]*)~i', $str, $matches);

So, yes, grab the page with any way you can and parse with regex

like image 41
zerkms Avatar answered Sep 20 '22 13:09

zerkms