I'm following this guide on querying from Wikidata.
I can get a certain entity (if I know its code) using with:
from wikidata.client import Client
client = Client()
entity = client.get('Q20145', load=True)
entity
>>><wikidata.entity.Entity Q20145 'IU'>
entity.description
>>>m'South Korean singer-songwriter, record producer, and actress'
But how can I get the RDF triples of that entity? That is, all the outgoing and incoming edges in the form of (subject, predicate, object)
Looks like this SO question managed to get the triples, but only from a data dump here. I'm trying to get it from the library itself.
If you only needed the outgoing edges, you could retrieve them directly by calling https://www.wikidata.org/wiki/Special:EntityData/Q20145.nt
from rdflib import Graph
g = Graph()
g.parse('https://www.wikidata.org/wiki/Special:EntityData/Q20145.nt', format="nt")
for subj, pred, obj in g:
print(subj, pred, obj)
To get the incoming and outgoing edges, you need to query the database. On Wikidata, this is done using the Wikidata Query Service and the query langauge SPARQL. The SPARQL expression to get all edges is as simple as DESCRIBE wd:Q20145
.
With Python, you can retrieve the results of the query with the following code:
import requests
import json
endpoint_url = "https://query.wikidata.org/sparql"
headers = { 'User-Agent': 'MyBot' }
payload = {
'query': 'DESCRIBE wd:Q20145',
'format': 'json'
}
r = requests.get(endpoint_url, params=payload, headers=headers)
results = r.json()
triples = []
for result in results["results"]["bindings"]:
triples.append((result["subject"], result["predicate"], result["object"]))
print(triples)
This gives you the full result origin from the complex underlying data model. If you want to query the incoming and outgoing edges separately, write instead of DESCRIBE wd:Q20145
either CONSTRUCT {?s ?p ?o} WHERE {BIND(wd:Q20145 AS ?s) ?s ?p ?o}
to only have the outgoing edges or CONSTRUCT {?s ?p ?o} WHERE {BIND(wd:Q20145 AS ?o) ?s ?p ?o}
to only have the incoming edges.
Depending on your goal, you may want to filter out some triples, e.g. statement triples, and you may want to simplify some triples. A possibility to get a clearer result is to replace the last four lines by:
triples = []
for result in results["results"]["bindings"]:
subject = result["subject"]["value"].replace('http://www.wikidata.org/entity/', '')
object = result["object"]["value"].replace('http://www.wikidata.org/entity/', '')
predicate = result["predicate"]["value"].replace('http://www.wikidata.org/prop/direct/', '')
if 'statement/' in subject or 'statement/' in object:
continue
triples.append((subject, predicate, object))
print(triples)
But how can I get the RDF triples of that entity?
By using SPARQL DESCRIBE
query (source), you get a single result RDF graph containing all the outgoing and incoming edges in the form of (subject, predicate, object). This can be achieved using the following Python example code (source):
from SPARQLWrapper import SPARQLWrapper
queryString = """DESCRIBE wd:Q20145"""
sparql = SPARQLWrapper("https://query.wikidata.org/sparql")
sparql.setQuery(queryString)
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
for result in results["results"]["bindings"]:
print(result)
If you want to get only the outgoing edges, use CONSTRUCT {?s ?p ?o} WHERE {BIND(wd:Q20145 AS ?s) ?s ?p ?o}
and for the incoming edges, use CONSTRUCT {?s ?p ?o} WHERE {BIND(wd:Q20145 AS ?o) ?s ?p ?o}
(thanks to @
UninformedUser).
Example code:
from SPARQLWrapper import SPARQLWrapper
queryString = """CONSTRUCT {?s ?p ?o} WHERE {BIND(wd:Q20145 AS ?s) ?s ?p ?o}"""
sparql = SPARQLWrapper("https://query.wikidata.org/sparql")
sparql.setQuery(queryString)
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
for result in results["results"]["bindings"]:
print(result)
The result with DESCRIBE
and CONSTRUCT
can be seen here and here respectively.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With