Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Example python script that uses DBPedia?

I am writing a python script to extract "Entity names" from a collection of thousands of news articles from a few countries and languages.

I would like to make use of the amazing DBPedia structured knwoledge, say for example to look up the names of "artists in egypt" and names of "companies in Canada".

(If these information was in SQL form, I would have had no problem.)

I would prefer to download the DBPedia content and use it offline. any ideas of what is needed to do so and how to query it locally from python ?

like image 940
jaz Avatar asked Sep 20 '11 15:09

jaz


2 Answers

DBpedia content is in RDF format. The dumps can be download from here

Dbpedia is a large dataset in RDF, for handling that amount of data you need to use Triple Store technology. For Dbpedia you will need one of native triple stores, I recommend you to use either Virtuoso or 4store. I personally prefer 4store.

Once you have your triple store set up with Dbpedia in it. You can use SPARQL to query the Dbpedia RDF triples. There are Python libraries that can help you with that. 4store and Virtuoso can give you results back in JSON so you can easily get by without any libraries.

Some simple urllib script like ...

def query(q,epr,f='application/json'):
    try:
        params = {'query': q}
        params = urllib.urlencode(params)
        opener = urllib2.build_opener(urllib2.HTTPHandler)
        request = urllib2.Request(epr+'?'+params)
        request.add_header('Accept', f)
        request.get_method = lambda: 'GET'
        url = opener.open(request)
        return url.read()
    except Exception, e:
        traceback.print_exc(file=sys.stdout)
        raise e 

can help you out to run SPARQL ... for instance

>>> q1 = """
... select ?birthPlace where {
... <http://dbpedia.org/resource/Claude_Monet> <http://dbpedia.org/property/birthPlace> ?birthPlace .
...  }"""
>>> print query(q1,"http://dbpedia.org/sparql")

{ "head": { "link": [], "vars": ["birthPlace"] },
  "results": { "distinct": false, "ordered": true, "bindings": [
    { "birthPlace": { "type": "literal", "xml:lang": "en", "value": "Paris, France" }} ] } }
>>> 

I hope this gives you an idea of how to start.

like image 131
Manuel Salvadores Avatar answered Nov 16 '22 20:11

Manuel Salvadores


In python3 the answer will look like this using the requests library:

def query(q, epr, f='application/json'):
    try:
        params = {'query': q}
        resp = requests.get(epr, params=params, headers={'Accept': f})
        return resp.text
    except Exception as e:
        print(e, file=sys.stdout)
        raise
like image 44
N. Alonso Avatar answered Nov 16 '22 22:11

N. Alonso