Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

querying metadata from HTML page with SPARQL returns nothing

I seem to be having some issue either using HTML::HTML5::Microdata::Parser or RDF::Query or with SPARQL syntax and semantics. I am interested in this bit from a news site page.

<div class="authors">
Autoři: <span itemprop="author" itemscope itemtype="http://schema.org/Person"><a rel="author" itemprop="url" class="name" href="http://vice.idnes.cz/novinari.aspx?idnov=2504" ><span itemprop="name">Zdeňka Trachtová</span></a></span>
,
<span itemprop="author" itemscope itemtype="http://schema.org/Person"><a rel="author" itemprop="url"  href="http://vice.idnes.cz/novinari.aspx?idnov=3495" ><span itemprop="additionalName">san</span></a><span class="h" itemprop="name">Sabina Netrvalová</span></span>
</div>

Here is my test code:

#! env perl

use strict;
use Data::Dumper;
use HTML::HTML5::Microdata::Parser;
use RDF::Query;
use IO::Handle;
use LWP::Simple;


STDOUT->binmode(":utf8");
STDERR->binmode(":utf8");

my $htmldoc = LWP::Simple::get(
    "http://zpravy.idnes.cz/zacinaji-zapisy-do-prvnich-trid-dn3-/domaci.aspx?c=A160114_171615_domaci_zt");
die "Could not fetch URL. $@" unless defined $htmldoc;

my $microdata = HTML::HTML5::Microdata::Parser->new (
    $htmldoc, $ARGV[0],
    {auto_config => 1, tdb_service => 1, xhtml_meta => 1, xhtml_rel => 1});
print STDERR "microdata->graph:\n", Dumper($microdata->graph), "\n";

my $query = RDF::Query->new(<<'SPARQL');
PREFIX schema: <http://schema.org/>
SELECT *
WHERE {
   ?author a schema:Person .
}
SPARQL

my $people = $query->execute($microdata->graph);
print STDERR "authors from RDF:\n", Dumper($people), "\n";
while (my $person = $people->next) {
    print STDERR "people: ", $person, "\n";
}

The options to the HTML::HTML5::Microdata::Parser were just my last ditch effort to make this work. (I have basically zero idea what I am doing.)

Any ideas how to make this work and get the authors' names?

like image 831
wilx Avatar asked Jan 22 '16 09:01

wilx


People also ask

Is SPARQL null?

NULLs or unbound variablesThe term “NULL” actually does not occur in the SPARQL spec, perhaps because it doesn't really have a good rep in the database world. Instead the spec talks about bound and unbound variables.

What types of queries does SPARQL support?

SPARQL allows for a query to consist of triple patterns, conjunctions, disjunctions, and optional patterns. Implementations for multiple programming languages exist. There exist tools that allow one to connect and semi-automatically construct a SPARQL query for a SPARQL endpoint, for example ViziQuer.

What is the difference between SPARQL and SQL?

SQL does this by accessing tables in relational databases, and SPARQL does this by accessing a web of Linked Data. (Of course, SPARQL can be used to access relational data as well, but it was designed to merge disparate sources of data.)


1 Answers

Just use Mojo::UserAgent and Mojo::DOM:

use strict;
use warnings;
use utf8;
use v5.10;

BEGIN {
    binmode *STDOUT, ':utf8';
    binmode *STDERR, ':utf8';
}

use Mojo::UserAgent;

my $url = "http://zpravy.idnes.cz/zacinaji-zapisy-do-prvnich-trid-dn3-/domaci.aspx?c=A160114_171615_domaci_zt";

my $dom = Mojo::UserAgent->new->get($url)->res->dom;

# Process all authors
for my $span ($dom->find('span[itemprop=author]')->each) {
    say $span->all_text;
}

Outputs:

Zdeňka Trachtová
san Sabina Netrvalová

For a short 8 minute tutorial on these modules, just check out Mojocast episode 5.

like image 64
Miller Avatar answered Sep 18 '22 13:09

Miller