I have a list with movie titles and want to look these up in DBpedia for meta information like "director". But I have trouble to identify the correct movie with SPARQL, because the titles sometimes don't exactly match.
How can I get the best match for a movie title from DBpedia using SPARQL?
Some problematic examples:
My current approach is to query the DBpedia endpoint for all movies and then filter by checking for single tokens (without punctuations), order by title and return the first result. E.g.:
SELECT ?resource ?title ?director WHERE {
?resource foaf:name ?title .
?resource rdf:type schema:Movie .
?resource dbo:director ?director .
FILTER (
contains(lcase(str(?title)), "die") &&
contains(lcase(str(?title)),"hard")
)
}
ORDER BY (?title)
LIMIT 1
This approach is very slow and also sometimes fails, e.g.:
SELECT ?resource ?title ?director WHERE {
?resource foaf:name ?title .
?resource rdf:type schema:Movie .
?resource dbo:director ?director .
FILTER (
contains(lcase(str(?title)), "hachi")
)
}
ORDER BY (?title)
LIMIT 10
where the correct result is on second place:
resource title director
http://dbpedia.org/resource/Chachi_420 "Chachi 420"@en http://dbpedia.org/resource/Kamal_Haasan
http://dbpedia.org/resource/Hachi:_A_Dog's_Tale "Hachi: A Dog's Tale"@en http://dbpedia.org/resource/Lasse_Hallström
http://dbpedia.org/resource/Hachiko_Monogatari "Hachikō Monogatari"@en http://dbpedia.org/resource/Seijirō_Kōyama
http://dbpedia.org/resource/Thachiledathu_Chundan "Thachiledathu Chundan"@en http://dbpedia.org/resource/Shajoon_Kariyal
Any ideas how to solve this problem? Or even better: How to query for best matches to a string with SPARQL in general?
Thanks!
I adapted the regex-approach mentioned in the comments and came up with a solution that works pretty well, better than anything I could get with bif:contains:
SELECT ?resource ?title ?match strlen(str(?title)) as ?lenTitle strlen(str(?match)) as ?lenMatch
WHERE {
?resource foaf:name ?title .
?resource rdf:type schema:Movie .
?resource dbo:director ?director .
bind( replace(LCASE(CONCAT('x',?title)), "^x(die)*(?:.*?(hard))*(?:.*?(with))*.*$", "$1$2$3") as ?match )
}
ORDER BY DESC(?lenMatch) ASC(?lenTitle)
LIMIT 5
It's not perfect, so I'm still open for suggestions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With