I am trying to query all instances of an entity from Wikidata. I found out that currently the only way to do this is to use the SPARQL endpoint.
I found an example query which does about what I want to do and successfully executed it from the Web interface. Unfortunately I can't seem to be able to execute it from within my Java code. I am using the openRDF SPARQL library. Here is my relevant code:
SPARQLRepository sparqlRepository = new SPARQLRepository(
"https://query.wikidata.org/");
SPARQLConnection sparqlConnection = new SPARQLConnection(
sparqlRepository);
String query = "SELECT ?s ?desc ?authorlabel (COUNT(DISTINCT ?sitelink) as ?linkcount) WHERE {"
+ "?s wdt:P31 wd:Q571 ."
+ "?sitelink schema:about ?s ."
+ "?s wdt:P50 ?author"
+ "OPTIONAL { ?s rdfs:label ?desc filter (lang(?desc) = \"en\"). }"
+ "OPTIONAL {"
+ "?author rdfs:label ?authorlabel filter (lang(?authorlabel) = \"en\")."
+ "}"
+ "} GROUP BY ?s ?desc ?authorlabel ORDER BY DESC(?linkcount)";
TupleQuery tupleQuery = sparqlConnection.prepareTupleQuery(
QueryLanguage.SPARQL, query);
System.out.println("Result for tupleQuery" + tupleQuery.evaluate());
And here is the response I'm receiving:
Exception in thread "main" org.openrdf.query.QueryEvaluationException: <html>
<head><title>405 Not Allowed</title></head>
<body bgcolor="white">
<center><h1>405 Not Allowed</h1></center>
<hr><center>nginx/1.9.4</center>
</body>
</html>
at org.openrdf.repository.sparql.query.SPARQLTupleQuery.evaluate(SPARQLTupleQuery.java:59)
at main.Test.main(Test.java:72)
Caused by: org.openrdf.repository.RepositoryException: <html>
<head><title>405 Not Allowed</title></head>
<body bgcolor="white">
<center><h1>405 Not Allowed</h1></center>
<hr><center>nginx/1.9.4</center>
</body>
</html>
at org.openrdf.http.client.HTTPClient.handleHTTPError(HTTPClient.java:953)
at org.openrdf.http.client.HTTPClient.sendTupleQueryViaHttp(HTTPClient.java:718)
at org.openrdf.http.client.HTTPClient.getBackgroundTupleQueryResult(HTTPClient.java:602)
at org.openrdf.http.client.HTTPClient.sendTupleQuery(HTTPClient.java:367)
at org.openrdf.repository.sparql.query.SPARQLTupleQuery.evaluate(SPARQLTupleQuery.java:52)
... 1 more
Normally I would assume that this means I need an API key of sorts, but the Wikidata API appears to be completely open. Did I make a mistake setting up my connection?
The proper endpoint URL for Wikidata is https://query.wikidata.org/sparql
- you're missing the last bit.
In addition, I noticed a few glitches in your code. First of all, you're doing this:
SPARQLConnection sparqlConnection = new SPARQLConnection(sparqlRepository);
This should be this:
RepositoryConnection sparqlConnection = sparqlRepository.getConnection();
Always retrieve your connection object from the Repository
object using getConnection()
- this means resources are shared and the Repository
can close 'dangling' connections if necessary.
Secondly: you can't print out the result of a query like this:
System.out.println("Result for tupleQuery" + tupleQuery.evaluate());
If you wish to print out the result to System.out
you should instead do something like this:
tupleQuery.evaluate(new SPARQLResultsTSVWriter(System.out));
Or (if you wish to customize the result a bit more):
for (BindingSet bs : QueryResults.asList(tupleQuery.evaluate())) {
System.out.println(bs);
}
For what it's worth - with the above changes the query request runs, but it appears your query is too 'heavy' for Wikidata - at least I got a timeout error from the server. Try a simpler query though, and you'll see the code works.
When I go to https://query.wikidata.org/ and have a look at Tools > SPARQL REST endpoint, I see (emphasis added):
SPARQL endpoint
SPARQL queries can be submitted directly to the SPARQL endpoint with a GET request to https://query.wikidata.org/sparql?query={SPARQL} (POST and other method requests will be denied with a "403 Forbidden").* The result is returned as XML by default, or as JSON if either the query parameter format=json or the header Accept: application/sparql-results+json are provided.
It looks like you're using a different URL (it doesn't look like you have the final sparql
on there), so you're probably not actually hitting that endpoint.
That said, since you can visit the URL that you are using (presumably using GET), it sounds like your API call might be doing a POST, so you may want to check how the query is going over the network, too.
There's an example of using this endpoint from Jena in Use Jena to query wikidata. The OP of that question actually had the same issue you're running into (the wrong query URL).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With