Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Querying Project Gutenberg catalog.rdf via SPARQL

Tags:

rdf

sparql

I'm having difficulty structuring a SPARQL query for the Project Gutenberg catalog (available at Gutenberg Feeds toward the bottom of the page). I know it is a fundamental lack of understanding on my part of how SparQL/RDF/etc. actually work, conflating it with SQL, etc. But I've tried several tutorials, and I just can't quite get my mind around piecing the WHERE clause together with what seems to be the multidimensional dataset.

I have imported catalog.rdf into a TDB database (from the Jena project), and am using the tdbquery tool to set up my query initially, before I wrap it into a command-line tool that allows searching by author or title.

Here is what I have so far:

$ cat gutenquery.tq
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX dcmitype: <http://purl.org/dc/dcmitype/>
PREFIX cc: <http://web.resource.org/cc/>
PREFIX pgterms: <http://www.gutenberg.org/rdfterms/>
PREFIX dcmitype: <http://purl.org/dc/dcmitype/>

SELECT ?title ?author
WHERE {
    ?book dc:title ?title  ;
          dc:creator ?author
}
LIMIT 10

$ ./tdbquery --loc=/var/db/gutenberg/ --file=gutenquery.tq
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
| title                                                                                               | author                                                       |
======================================================================================================================================================================
| "The Belgian Curtain\nEurope after Communism"^^rdf:XMLLiteral                                       | "Vaknin, Samuel, 1961-"^^rdf:XMLLiteral                      |
| "Fairy Tales; Their Origin and Meaning\nWith Some Account of Dwellers in Fairyland"^^rdf:XMLLiteral | "Bunce, John Thackray, 1828-1899"^^rdf:XMLLiteral            |
| "The World English Bible (WEB): Zephaniah"^^rdf:XMLLiteral                                          | "Anonymous"^^rdf:XMLLiteral                                  |
| "Lectures of Col. R. G. Ingersoll - Latest"^^rdf:XMLLiteral                                         | "Ingersoll, Robert Green, 1833-1899"^^rdf:XMLLiteral         |
| "Selections from Erasmus\nPrincipally from his Epistles"^^rdf:XMLLiteral                            | "Erasmus, Desiderius, 1469-1536"^^rdf:XMLLiteral             |
| "East and West\nPoems"^^rdf:XMLLiteral                                                              | "Harte, Bret, 1836-1902"^^rdf:XMLLiteral                     |
| "The Enormous Room"^^rdf:XMLLiteral                                                                 | "Cummings, E. E. (Edward Estlin), 1894-1962"^^rdf:XMLLiteral |
| "The Enormous Room"^^rdf:XMLLiteral                                                                 | _:b0                                                         |
| "Actes et Paroles, Volume 4\nDepuis l'Exil 1876-1885"^^rdf:XMLLiteral                               | "Hugo, Victor, 1802-1885"^^rdf:XMLLiteral                    |
| "L'ÃŽle Des Pingouins"^^rdf:XMLLiteral                                                               | "France, Anatole, 1844-1924"^^rdf:XMLLiteral                 |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------

A typical entry from PG looks like this, although not all fields are present in all records:

<pgterms:etext rdf:ID="etext7250">
  <dc:publisher>&pg;</dc:publisher>
  <dc:title rdf:parseType="Literal">A Connecticut Yankee in King Arthur's Court, Part 9.</dc:title>
  <dc:creator rdf:parseType="Literal">Twain, Mark, 1835-1910</dc:creator>
  <pgterms:friendlytitle rdf:parseType="Literal">A Connecticut Yankee in King Arthur's Court, Part </pgterms:friendlytitle>
  <dc:language><dcterms:ISO639-2><rdf:value>en</rdf:value></dcterms:ISO639-2></dc:language>
  <dc:subject>
    <rdf:Bag>
      <rdf:li><dcterms:LCSH><rdf:value>Americans -- Great Britain -- Fiction</rdf:value></dcterms:LCSH></rdf:li>
      <rdf:li><dcterms:LCSH><rdf:value>Arthurian romances -- Adaptations</rdf:value></dcterms:LCSH></rdf:li>
      <rdf:li><dcterms:LCSH><rdf:value>Britons -- Fiction</rdf:value></dcterms:LCSH></rdf:li>
      <rdf:li><dcterms:LCSH><rdf:value>Fantasy fiction</rdf:value></dcterms:LCSH></rdf:li>
      <rdf:li><dcterms:LCSH><rdf:value>Kings and rulers -- Fiction</rdf:value></dcterms:LCSH></rdf:li>
      <rdf:li><dcterms:LCSH><rdf:value>Knights and knighthood -- Fiction</rdf:value></dcterms:LCSH></rdf:li>
      <rdf:li><dcterms:LCSH><rdf:value>Satire</rdf:value></dcterms:LCSH></rdf:li>
      <rdf:li><dcterms:LCSH><rdf:value>Time travel -- Fiction</rdf:value></dcterms:LCSH></rdf:li>
    </rdf:Bag>
  </dc:subject>
  <dc:subject><dcterms:LCC><rdf:value>PS</rdf:value></dcterms:LCC></dc:subject>
  <dc:created><dcterms:W3CDTF><rdf:value>2004-07-07</rdf:value></dcterms:W3CDTF></dc:created>
  <dc:rights rdf:resource="&lic;" />

In addition to, e.g. dc:author and dc:title, I'd like to get the value from the attribute of pgterms:etext rdf:ID="STUFF IN HERE":

<pgterms:etext rdf:ID="etext7250">

As well as combining the entries in the list under dc:subject, etc. Basically, provide all the info on this book as a single coherent entry via the command-line query.

So, my questions:

  1. How can I combine the attribute value from pg:eterms rdf:ID with the rest of the query?
  2. How can I combine the entries under dc:subject's list into one string?
  3. Since not all fields show up for every record, should I use the OPTIONAL() clause to surround fields that don't always appear?
  4. How can I limit my query based on a user-specified string? Am I supposed to use FILTER() for that?

Thank you so much. I have been able to construct queries to get single-layer information, but anything beyond that, attributes etc. are nigh inscrutable to me. This is much different to standard SQL, and a much more involved project than I thought at first.

like image 911
Sdaz MacSkibbons Avatar asked Jul 25 '10 06:07

Sdaz MacSkibbons


1 Answers

How can I combine the attribute value from pg:eterms rdf:ID with the rest of the query?

The RDF id, will be the URI of the book in your kb. In your case putting ?book into your select clause will bring it back.

How can I combine the entries under dc:subject's list into one string?

I am not sure about this. You can put dc:subject into your query and then itterate with your client.

Since not all fields show up for every record, should I use the OPTIONAL() clause to surround fields that don't always appear?

Yes

How can I limit my query based on a user-specified string? Am I supposed to use FILTER() for that?

Yes, specifically FILTER regex()

like image 63
Jeremy French Avatar answered Sep 28 '22 00:09

Jeremy French