Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SPARQL: how to find similar strings?

I'm using Jena to query data stored in an ontology. Some of the objects are identified by a string, however sometimes the exact same string is not available, as I am processing scanned documents and so there may be OCR-Errors. Therefore, I'd like to find the most similar strings. Is there a way to use SPARQL for this purpose? Can I somehow calculate levenshtein distance in SPARQL?

If this is not possible, I can still calculate the levenshtein distance in java. However, an efficient algorithm would still require to filter out irrelevant strings using SPARQL.

like image 952
Pedro Avatar asked Dec 21 '22 00:12

Pedro


1 Answers

SPARQL can't do this directly, but you could implement the levenshtein distance function in java, and use it in a SPARQL FILTER clause. Extensions in ARQ has details about using extension functions.

like image 65
Gregory Williams Avatar answered Jan 02 '23 23:01

Gregory Williams