I have a triple store that contains mail archive data. So let's say I have a lot of persons (foaf:Person) that have sent (ex:hasSent) and received (ex:hasReceived) emails (ex:Email).
Example:
SELECT ?person ?email
WHERE {
?email rdf:type ex:Email.
?person rdf:type foaf:Person;
ex:hasSent ?email.
}
The same works for ex:hasReceived, of course. Now I would like to do some statistics and analytics, i.e. determine how many emails an individual has sent and received. Doing this for only one predicate is a simple aggregation:
SELECT ?person (COUNT(?email) AS ?count)
WHERE {
?email rdf:type ex:Email.
?person rdf:type foaf:Person;
ex:hasSent ?email.
}
GROUP BY ?person
However, I need need the number of received emails as well and I would like to do this without having to issue a separate query. So I tried the following:
SELECT ?person (COUNT(?email1) AS ?sent_emails) (COUNT(?email2) AS ?received_emails)
WHERE {
?person rdf:type foaf:Person.
?sent_email rdf:type ex:Email.
?person ex:hasSent ?sent_email.
?received_email rdf:type ex:Email.
?person ex:hasReceived ?received_email.
}
GROUP BY ?person
This did not seem to be right, as the numbers for the emails sent vs. received were exactly the same. I assume this is because my SPARQL statement results in a cross product of all mails a person has ever sent and received, right?
What do I need to do in order to get the statistics right on a per-individual basis?
COUNT(?email1) isn't counting anything as ?email1 is undefined. Also, there is partial cross product as you mention - DISTINCT will help.
Try (COUNT(DISTINCT ?sent_email) AS ?sent_emails)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With