We are building an iterative algorithm using a set of SPARQL queries for each iteration. This algorithm works great, but we're running into a CPU utilization issue. SPARQL engines like Fuseki are not truly multithreaded; they allow multiple simultaneous queries to be executed in multiple threads, but each individual query is single threaded. From looking at some Fuseki notes, I get the impression that Fuseki is not thread safe so this is not a trivial issue.
Since our algorithm is inherently serial in terms of the SPARQL queries, and we are interested in one run at a time, is there some SPARQL engine that can take advantage of, say, 32 cores?
Yes there are, BigData is a open source/commercial example of this.
My own project dotNetRDF also uses multi-threaded heavily, in my case I levarage the .Net PLINQ feature to parallelize joins, products, FILTER
and BIND
operations though they aren't always amenable to this.
On the note of Fuseki (Disclaimer I am a also involved in the Apache Jena project) as AndyS points out Fuseki itself is thread safe. The issue is that the query engine (ARQ) is not designed to parallelize operations, some ideas about this have been discussed in the past but IMO it would involve a fairly significant rewrite.
The Urika engine developed and marketed by YarcData is highly multithreaded (up to several thousand simultaneous threads) and runs in very large memory. Probably not suitable for a hobbyist budget though. :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With