Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SPARQL using subquery with limit

I am developing a java application that uses ARQ to execute SPARQL queries using a Fuseki endpoint over TDB.

The application needs a query that returns the place of birth of each person and other person that was born in the same place.

To start, I wrote this SPARQL query that returns person_ids and the place of birth of each person.

prefix fb: <http://rdf.freebase.com/ns/>
prefix fn: <http://www.w3.org/2005/xpath-functions#>
select ?person_id ?place_of_birth 
where {
    ?person_id fb:type.object.type fb:people.person .
    ?person_id fb:people.person.place_of_birth ?place_of_birth_id .
    ?place_of_birth_id fb:type.object.name ?place_of_birth .
     FILTER (langMatches(lang(?place_of_birth),"en"))
}
LIMIT 10

----------------------------------
| person_id    | place_of_birth  |
==================================
| fb:m.01vtj38 | "El Centro"@en  |
| fb:m.01vsy7t | "Brixton"@en    |
| fb:m.09prqv  | "Pittsburgh"@en |
----------------------------------

After that, I added a subquery (https://jena.apache.org/documentation/query/sub-select.html) adding other person who was born there, but I get more than one person related and I only need one.

prefix fb: <http://rdf.freebase.com/ns/>
prefix fn: <http://www.w3.org/2005/xpath-functions#>
select ?person_id ?place_of_birth ?other_person_id
where {
    ?person_id fb:type.object.type fb:people.person .
    ?person_id fb:people.person.place_of_birth ?place_of_birth_id .
    ?place_of_birth_id fb:type.object.name ?place_of_birth .
    {
       select  ?other_person_id
       where {
       ?place_of_birth_id fb:location.location.people_born_here ?other_person_id .
       }
     }
     FILTER (langMatches(lang(?place_of_birth),"en"))
}
LIMIT 10

---------------------------------------------------
| person_id    | place_of_birth | other_person_id |
===================================================
| fb:m.01vtj38 | "El Centro"@en | fb:m.01vtj38    |
| fb:m.01vtj38 | "El Centro"@en | fb:m.01vsy7t    |
| fb:m.01vtj38 | "El Centro"@en | fb:m.09prqv     |
---------------------------------------------------

I have tried to add a LIMIT 1 subquery but it seems that does not work ( the query is executed but never ends )

prefix fb: <http://rdf.freebase.com/ns/>
prefix fn: <http://www.w3.org/2005/xpath-functions#>
select ?person_id ?place_of_birth ?other_person_id
where {
    ?person_id fb:type.object.type fb:people.person .
    ?person_id fb:people.person.place_of_birth ?place_of_birth_id .
    ?place_of_birth_id fb:type.object.name ?place_of_birth .
    {
       select  ?other_person_id
       where {
       ?place_of_birth_id fb:location.location.people_born_here ?other_person_id .
       }
       LIMIT 1
     }
     FILTER (langMatches(lang(?place_of_birth),"en"))
}
LIMIT 3

Is there a way to return only one result in the subquery, or can I not do that using SPARQL.

like image 748
Diego Avatar asked Nov 21 '14 12:11

Diego


1 Answers

You can use limits with subqueries

You can use limits in subqueries. Here's an example:

select ?x ?y where {
  values ?x { 1 2 3 4 }
  {
    select ?y where {
      values ?y { 5 6 7 8 }
    }
    limit 2
  }
}
limit 5
---------
| x | y |
=========
| 1 | 5 |
| 1 | 6 |
| 2 | 5 |
| 2 | 6 |
| 3 | 5 |
---------

As you can see, you get two values from the subquery (5 and 6), and these are combined with the bindings from the outer query, from which we get five rows in total (because of the limit).

Subqueries are evaluated innermost first

However, keep in mind that subqueries are evaluated from the innermost first, to the outermost. That means that in your query,

select ?person_id ?place_of_birth ?other_person_id
where {
    ?person_id fb:type.object.type fb:people.person .
    ?person_id fb:people.person.place_of_birth ?place_of_birth_id .
    ?place_of_birth_id fb:type.object.name ?place_of_birth .
    {
       select  ?other_person_id
       where {
       ?place_of_birth_id fb:location.location.people_born_here ?other_person_id .
       }
       LIMIT 1
     }
     FILTER (langMatches(lang(?place_of_birth),"en"))
}
LIMIT 3

you are finding one match for

?place_of_birth_id fb:location.location.people_born_here ?other_person_id .

and passing the ?other_person_id binding out into the outer query. The rest of the outer query doesn't use ?other_person_id, though, so it doesn't really have any effect on the results.

What to do instead

If you need only one other person

The application needs a query that returns the place of birth of each person and other person that was born in the same place.

Conceptually, you could look at this as picking a person, finding their place of birth, and sampling one more person from the people born in that place. You can actually write the query like that, too:

select ?person_id ?place_of_birth (sample(?other_person_idx) as ?other_person_id)
where {
    ?person_id fb:type.object.type fb:people.person .
    ?person_id fb:people.person.place_of_birth ?place_of_birth_id .
    ?place_of_birth_id fb:type.object.name ?place_of_birth .
    FILTER (langMatches(lang(?place_of_birth),"en"))
    ?place_of_birth_id fb:location.location.people_born_here ?other_person_idx .
    filter ( ?other_person_idx != ?person_id )
}
group by ?person_id ?place_of_birth

If you need more than one

This is a much trickier problem if you need more than one "other result" for each result. That's the problem in Nested queries in sparql with limits. There's an approach in How to limit SPARQL solution group size? that can be used for this.

like image 80
Joshua Taylor Avatar answered Oct 22 '22 11:10

Joshua Taylor