I seem to be having some issue either using HTML::HTML5::Microdata::Parser
or RDF::Query
or with SPARQL syntax and semantics. I am interested in this bit from a news site page.
<div class="authors">
Autoři: <span itemprop="author" itemscope itemtype="http://schema.org/Person"><a rel="author" itemprop="url" class="name" href="http://vice.idnes.cz/novinari.aspx?idnov=2504" ><span itemprop="name">Zdeňka Trachtová</span></a></span>
,
<span itemprop="author" itemscope itemtype="http://schema.org/Person"><a rel="author" itemprop="url" href="http://vice.idnes.cz/novinari.aspx?idnov=3495" ><span itemprop="additionalName">san</span></a><span class="h" itemprop="name">Sabina Netrvalová</span></span>
</div>
Here is my test code:
#! env perl
use strict;
use Data::Dumper;
use HTML::HTML5::Microdata::Parser;
use RDF::Query;
use IO::Handle;
use LWP::Simple;
STDOUT->binmode(":utf8");
STDERR->binmode(":utf8");
my $htmldoc = LWP::Simple::get(
"http://zpravy.idnes.cz/zacinaji-zapisy-do-prvnich-trid-dn3-/domaci.aspx?c=A160114_171615_domaci_zt");
die "Could not fetch URL. $@" unless defined $htmldoc;
my $microdata = HTML::HTML5::Microdata::Parser->new (
$htmldoc, $ARGV[0],
{auto_config => 1, tdb_service => 1, xhtml_meta => 1, xhtml_rel => 1});
print STDERR "microdata->graph:\n", Dumper($microdata->graph), "\n";
my $query = RDF::Query->new(<<'SPARQL');
PREFIX schema: <http://schema.org/>
SELECT *
WHERE {
?author a schema:Person .
}
SPARQL
my $people = $query->execute($microdata->graph);
print STDERR "authors from RDF:\n", Dumper($people), "\n";
while (my $person = $people->next) {
print STDERR "people: ", $person, "\n";
}
The options to the HTML::HTML5::Microdata::Parser
were just my last ditch effort to make this work. (I have basically zero idea what I am doing.)
Any ideas how to make this work and get the authors' names?
NULLs or unbound variablesThe term “NULL” actually does not occur in the SPARQL spec, perhaps because it doesn't really have a good rep in the database world. Instead the spec talks about bound and unbound variables.
SPARQL allows for a query to consist of triple patterns, conjunctions, disjunctions, and optional patterns. Implementations for multiple programming languages exist. There exist tools that allow one to connect and semi-automatically construct a SPARQL query for a SPARQL endpoint, for example ViziQuer.
SQL does this by accessing tables in relational databases, and SPARQL does this by accessing a web of Linked Data. (Of course, SPARQL can be used to access relational data as well, but it was designed to merge disparate sources of data.)
Just use Mojo::UserAgent and Mojo::DOM:
use strict;
use warnings;
use utf8;
use v5.10;
BEGIN {
binmode *STDOUT, ':utf8';
binmode *STDERR, ':utf8';
}
use Mojo::UserAgent;
my $url = "http://zpravy.idnes.cz/zacinaji-zapisy-do-prvnich-trid-dn3-/domaci.aspx?c=A160114_171615_domaci_zt";
my $dom = Mojo::UserAgent->new->get($url)->res->dom;
# Process all authors
for my $span ($dom->find('span[itemprop=author]')->each) {
say $span->all_text;
}
Outputs:
Zdeňka Trachtová
san Sabina Netrvalová
For a short 8 minute tutorial on these modules, just check out Mojocast episode 5.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With