Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to perform pagination in Gremlin

In Tinkerpop 3, how to perform pagination? I want to fetch the first 10 elements of a query, then the next 10 without having to load them all in memory. For example, the query below returns 1000,000 records. I want to fetch them 10 by 10 without loading all the 1000,000 at once.

g.V().has("key", value).limit(10)

Edit

A solution that works through HttpChannelizer on Gremlin Server would be ideal.

like image 603
Mohamed Taher Alrefaie Avatar asked Oct 03 '16 08:10

Mohamed Taher Alrefaie


2 Answers

From a functional perspective, a nice looking bit of Gremlin for paging would be:

gremlin> g.V().hasLabel('person').fold().as('persons','count').
               select('persons','count').
                 by(range(local, 0, 2)).
                 by(count(local))
==>[persons:[v[1],v[2]],count:4]
gremlin> g.V().hasLabel('person').fold().as('persons','count').
               select('persons','count').
                 by(range(local, 2, 4)).
                 by(count(local))
==>[persons:[v[4],v[6]],count:4]

In this way you get the total count of vertices with the result. Unfortunately, the fold() forces you to count all the vertices which will require iterating them all (i.e. bringing them all into memory).

There really is no way to avoid iterating all 100,000 vertices in this case as long as you intend to execute your traversal in multiple separate attempts. For example:

gremlin> g.V().hasLabel('person').range(0,2)
==>v[1]
==>v[2]
gremlin> g.V().hasLabel('person').range(2,4)
==>v[4]
==>v[6]

The first statement is the same as if you'd terminated the traversal with limit(2). On the second traversal, that only wants the second two vertices, it not as though you magically skip iterating the first two as it is a new traversal. I'm not aware of any TinkerPop graph database implementation that will do that efficiently - they all have that behavior.

The only way to do ten vertices at a time without having them all in memory is to use the same Traversal instance as in:

gremlin> t = g.V().hasLabel('person');[]
gremlin> t.next(2)
==>v[1]
==>v[2]
gremlin> t.next(2)
==>v[4]
==>v[6]

With that model you only iterate the vertices once and don't bring them all into memory at a single point in time.

Some other thoughts on this topic can be found in this blog post.

like image 124
stephen mallette Avatar answered Nov 14 '22 12:11

stephen mallette


Why not add order().by() and perform range() function on your gremlin query.

like image 27
jaypeeig Avatar answered Nov 14 '22 13:11

jaypeeig