How can I select random sample from DBpedia using the sparql endpoint?
This query
SELECT ?s WHERE { ?s ?p ?o . FILTER ( 1 > bif:rnd (10, ?s, ?p, ?o) ) } LIMIT 10
(found here) seems to work ok on most SPARQL endpoints, but on http://dbpedia.org/sparql it gets cached (so it returns always the same 10 nodes).
If i try from JENA, I get the following exception:
Unresolved prefixed name: bif:rnd
And I can't find the what the 'bif' namespace is.
Any idea on how to solve this?
Mulone
In SPARQL 1.1 you can do:
SELECT ?s
WHERE {
?s ?p ?o
}
ORDER BY RAND()
LIMIT 10
I don't know offhand how many store will optimise, or even implement this yet though.
[see comment below, this doesn't quite work]
An alternative is:
SELECT (SAMPLE(?s) AS ?ss) WHERE { ?s ?p ?o } GROUP BY ?s
But I'd think that's even less likely to be optimised.
bif:rnd
is not SPARQL standard and therefore not portable to any SPARQL endpoint. You can use LIMIT , ORDER and OFFSET to simulate a random sample with a standard query. Something like ...
SELECT * WHERE { ?s ?p ?o }
ORDER BY ?s OFFSET $some_random_number$ LIMIT 10
Where some_random_number
is a number that is generated by your application. This should avoid the caching problem but this query is anyway quite expensive and I don't know if public endpoints will support it.
Try to avoid completely open patterns like ?s ?p ?o
and your query will be much more efficient.
bif:rnd is a Virtuoso specific extension and will thus only work again Virtuoso SPARQL endpoints.
bif is the prefix for Virtuoso Built In Functions which enable any Virtuoso function to be called in SPARQL, with rnd being a Virtuoso function for returning random numbers.
I encountered the same problem and none of the solutions here addressed my issue. Here is my solution; it was non-trivial and quite a hack. This works for DBPedia as of now, and may work for other SPARQL endpoints, but it is not guaranteed to work for future releases.
DBPedia uses Virtuoso, which supports an undocumented argument to the RAND
function; the argument effectively specifies the range to use for the PRNG. The game is to trick Virtuoso into believing that the input argument cannot be statically-evaluated before each result row is computed, forcing the program to evaluate RAND()
for every binding:
select * {
?s dbo:isPartOf ?o . # Whatever your pattern is
bind(rand(1 + strlen(str(?s))*0) as ?rid)
} order by ?rid
The magic happens in rand(1 + strlen(str(?s))*0)
which generates the equivalent of rand()
; but forces it to run on every match by exploiting the fact that the program cannot predict the value of an expression that involves some variable (in this case, we just compute the length of the IRI as a string). The actual expression is not important, since we multiply it by 0
to ignore it completely, then add 1
to make rand
execute normally.
This only works because the developers did not go this far in their static-code-evaluation of expressions. They could have easily written a branch for "multiply by zero", but alas they did not :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With