I am trying to write a SPARQL query that will extract all relevant triples from a triplestore, using Construct. Essentially, the triplestore is containing a bunch of JSON-LD documents that got parsed into triples, so there is a predictable set of verbs and pattern, and my goal is to reconstruct one of these documents by getting the relevant triples. The documents were JSON objects roughly 7 nested objects deep, and the structure is generally known but any leaf object may have unknown properties I want to get back. So one way I can go about this is:
CONSTRUCT WHERE
{
# get top level object
?subject <:knownProperty1> ?v1 .
?subject <:knownProperty2> ?v2 .
?subject <:knownProperty3> ?v3 .
# leaf subobjects should get all their fields included
?v1 ?v1_p ?v1_o .
?v2 ?v2_p ?v2_o .
?v3 ?v3_p ?v3_o .
# v3 has these nested objects.
?v3 <:knownNest1> ?n1 .
?n1 ?n1_p ?n1_o .
# n2 is the next level of nesting
?n1 <:knownNest2> ?n2 .
?n2 ?n2_p ?n2_o .
#... and so on
}
This produces a set of triples that is orders of magnitude larger than the actual document due to duplication -- it is correct but it creates "a graph" for every possible combinatorial match of these values; especially because each level of nesting may have multiple (an array of) subobjects. It gets hairier because many of these known fields are also optional. So for example all the graph matches which assign one concrete value per variable, that include ?subject <:knownProperty1> <:value1>
, supply one copy of that triple, resulting in it being included 100s-1000s of times. In my simple test case that I am using to iterate on, there are 106 triples in the input, and fully specifying the allowed structure as shown above results in a CONSTRUCT result set of 5.5 MILLION triples with a query latency (in RAM) of over 60 seconds.
I can handle writing a complex query but I believe this is a code smell given that the basic problem is not that complicated. So my question is:
or any other suggestions about the proper way to try this. Thank you!
I use the following pattern and process to write construct queries like this.
SELECT * WHERE
{
{
# get top level object
} UNION {
# leaf subobjects should get all their fields included
} UNION {
# v3 has these nested objects.
} UNION {
# n2 is the next level of nesting
} UNION {
#... and so on
}
}
Now you can run the query and verify the output. If all is ok write the Replace the 'SELECT *' with your CONSTRUCT body. Imagine your CONSTRUCT template get's called for every line of the table form your SELECT query.
CONSTRUCT {
# get top level object triples
.... use the variables from UNION block 1
# leaf subobjects should get all their fields included
.... use the variables from UNION block 2
# v3 has these nested objects.
.... use the variables from UNION block 3
# n2 is the next level of nesting
.... use the variables from UNION block 4
# and so on ....
}
WHERE
{
{
# get top level object
} UNION {
# leaf subobjects should get all their fields included
} UNION {
# v3 has these nested objects.
} UNION {
# n2 is the next level of nesting
} UNION {
#... and so on
}
}
This approach fits well form me.
Cons: This approach sometimes leads to 'repeat yourself' in the different UNION blocks.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With