In Tinkerpop 3, how to perform pagination? I want to fetch the first 10 elements of a query, then the next 10 without having to load them all in memory. For example, the query below returns 1000,000 records. I want to fetch them 10 by 10 without loading all the 1000,000 at once.
g.V().has("key", value).limit(10)
A solution that works through HttpChannelizer
on Gremlin Server would be ideal.
From a functional perspective, a nice looking bit of Gremlin for paging would be:
gremlin> g.V().hasLabel('person').fold().as('persons','count').
select('persons','count').
by(range(local, 0, 2)).
by(count(local))
==>[persons:[v[1],v[2]],count:4]
gremlin> g.V().hasLabel('person').fold().as('persons','count').
select('persons','count').
by(range(local, 2, 4)).
by(count(local))
==>[persons:[v[4],v[6]],count:4]
In this way you get the total count of vertices with the result. Unfortunately, the fold()
forces you to count all the vertices which will require iterating them all (i.e. bringing them all into memory).
There really is no way to avoid iterating all 100,000 vertices in this case as long as you intend to execute your traversal in multiple separate attempts. For example:
gremlin> g.V().hasLabel('person').range(0,2)
==>v[1]
==>v[2]
gremlin> g.V().hasLabel('person').range(2,4)
==>v[4]
==>v[6]
The first statement is the same as if you'd terminated the traversal with limit(2)
. On the second traversal, that only wants the second two vertices, it not as though you magically skip iterating the first two as it is a new traversal. I'm not aware of any TinkerPop graph database implementation that will do that efficiently - they all have that behavior.
The only way to do ten vertices at a time without having them all in memory is to use the same Traversal
instance as in:
gremlin> t = g.V().hasLabel('person');[]
gremlin> t.next(2)
==>v[1]
==>v[2]
gremlin> t.next(2)
==>v[4]
==>v[6]
With that model you only iterate the vertices once and don't bring them all into memory at a single point in time.
Some other thoughts on this topic can be found in this blog post.
Why not add order().by()
and perform range()
function on your gremlin query.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With