Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to access the Wikidata SPARQL interface from Java?

I am trying to query all instances of an entity from Wikidata. I found out that currently the only way to do this is to use the SPARQL endpoint.

I found an example query which does about what I want to do and successfully executed it from the Web interface. Unfortunately I can't seem to be able to execute it from within my Java code. I am using the openRDF SPARQL library. Here is my relevant code:

SPARQLRepository sparqlRepository = new SPARQLRepository(
        "https://query.wikidata.org/");
SPARQLConnection sparqlConnection = new SPARQLConnection(
        sparqlRepository);

String query = "SELECT ?s ?desc ?authorlabel (COUNT(DISTINCT ?sitelink) as ?linkcount) WHERE {"
        + "?s wdt:P31 wd:Q571 ."
        + "?sitelink schema:about ?s ."
        + "?s wdt:P50 ?author"
        + "OPTIONAL { ?s rdfs:label ?desc filter (lang(?desc) = \"en\"). }"
        + "OPTIONAL {"
        + "?author rdfs:label ?authorlabel filter (lang(?authorlabel) = \"en\")."
        + "}"
        + "} GROUP BY ?s ?desc ?authorlabel ORDER BY DESC(?linkcount)";

TupleQuery tupleQuery = sparqlConnection.prepareTupleQuery(
        QueryLanguage.SPARQL, query);
System.out.println("Result for tupleQuery" + tupleQuery.evaluate());

And here is the response I'm receiving:

Exception in thread "main" org.openrdf.query.QueryEvaluationException: <html>
<head><title>405 Not Allowed</title></head>
<body bgcolor="white">
<center><h1>405 Not Allowed</h1></center>
<hr><center>nginx/1.9.4</center>
</body>
</html>
    at org.openrdf.repository.sparql.query.SPARQLTupleQuery.evaluate(SPARQLTupleQuery.java:59)
    at main.Test.main(Test.java:72)
Caused by: org.openrdf.repository.RepositoryException: <html>
<head><title>405 Not Allowed</title></head>
<body bgcolor="white">
<center><h1>405 Not Allowed</h1></center>
<hr><center>nginx/1.9.4</center>
</body>
</html>
    at org.openrdf.http.client.HTTPClient.handleHTTPError(HTTPClient.java:953)
    at org.openrdf.http.client.HTTPClient.sendTupleQueryViaHttp(HTTPClient.java:718)
    at org.openrdf.http.client.HTTPClient.getBackgroundTupleQueryResult(HTTPClient.java:602)
    at org.openrdf.http.client.HTTPClient.sendTupleQuery(HTTPClient.java:367)
    at org.openrdf.repository.sparql.query.SPARQLTupleQuery.evaluate(SPARQLTupleQuery.java:52)
    ... 1 more

Normally I would assume that this means I need an API key of sorts, but the Wikidata API appears to be completely open. Did I make a mistake setting up my connection?

like image 394
Andreas Hartmann Avatar asked May 23 '16 19:05

Andreas Hartmann


2 Answers

The proper endpoint URL for Wikidata is https://query.wikidata.org/sparql - you're missing the last bit.

In addition, I noticed a few glitches in your code. First of all, you're doing this:

SPARQLConnection sparqlConnection = new SPARQLConnection(sparqlRepository);

This should be this:

RepositoryConnection sparqlConnection = sparqlRepository.getConnection();

Always retrieve your connection object from the Repository object using getConnection() - this means resources are shared and the Repository can close 'dangling' connections if necessary.

Secondly: you can't print out the result of a query like this:

System.out.println("Result for tupleQuery" + tupleQuery.evaluate());

If you wish to print out the result to System.out you should instead do something like this:

tupleQuery.evaluate(new SPARQLResultsTSVWriter(System.out));

Or (if you wish to customize the result a bit more):

for (BindingSet bs : QueryResults.asList(tupleQuery.evaluate())) {
    System.out.println(bs);
}

For what it's worth - with the above changes the query request runs, but it appears your query is too 'heavy' for Wikidata - at least I got a timeout error from the server. Try a simpler query though, and you'll see the code works.

like image 144
Jeen Broekstra Avatar answered Sep 28 '22 09:09

Jeen Broekstra


When I go to https://query.wikidata.org/ and have a look at Tools > SPARQL REST endpoint, I see (emphasis added):

SPARQL endpoint

SPARQL queries can be submitted directly to the SPARQL endpoint with a GET request to https://query.wikidata.org/sparql?query={SPARQL} (POST and other method requests will be denied with a "403 Forbidden").* The result is returned as XML by default, or as JSON if either the query parameter format=json or the header Accept: application/sparql-results+json are provided.

It looks like you're using a different URL (it doesn't look like you have the final sparql on there), so you're probably not actually hitting that endpoint.

That said, since you can visit the URL that you are using (presumably using GET), it sounds like your API call might be doing a POST, so you may want to check how the query is going over the network, too.

There's an example of using this endpoint from Jena in Use Jena to query wikidata. The OP of that question actually had the same issue you're running into (the wrong query URL).

like image 23
Joshua Taylor Avatar answered Sep 28 '22 09:09

Joshua Taylor