I have a set up where I am using the gremlin-core library to query a remote Janusgraph server. The data size is moderate for now but will increase in the future.
A few days ago, I saw the "Max frame length of 65536 has been exceeded" error on my client. The value for the maxContentLength parameter in my server yaml file was set to default (65536). I dug up the code and realized that I am sending a large array of vertex ids as a query parameter to fetch vertices. I applied a batch to the array with a size of 100 vertex ids per batch and it resolved the issue.
After sometime I started seeing this error again in my client logs. This time around, there was no query with a large number of parameters being sent to the server. I saw a proposed solution on this topic which said that I need to set the maxContentLength parameter on the client-side as well. I did that and the issue got resolved. However, it raised a few questions regarding the configuration parameters, their values and their impact on the query request/response size.
The answers to these questions are crucial for me to make a robust server that will not break under the onslaught of data.
Thanks in advance
Anya
The maxContentLength
is the number of bytes a single "message" can contain as a request or a response. It serves the same function as similar settings in web servers to allow filtering of obviously invalid requests. The setting has little to do with database size and more to do with the types of requests you are making and the nature of your results. For requests, I tend to think it atypical for a request to exceed 65k in most situations. Folks who exceed that size are typically trying to do batch loading or are using code generated scripts (the latter is typically problematic, but I won't go into details). For responses, 65k may not be enough depending on the nature of your queries. For example, the query:
g.V().valueMap(true)
will return all vertices in your database as an Iterator<Map>
and Gremlin Server will stream those result back in batches controlled by the resultIterationBatchSize
(default is 64). So if you have 128 vertices in your database Gremlin Server will stream back two batches of results behind the scenes. If those two batches are each below maxContentLength
in size then no problems. If your batches are bigger than that (because you have say, 1000 properties on each vertex) then you either need to
maxContentLength
resultIterationBatchSize
Also note that the previous query is very different from something like:
g.V().valueMap(true).fold()
because the fold()
will realize all the vertices into a list in memory and then that list must be serialized as a whole. There is only 1 result (i.e. List<Map>
with 128 vertices) and thus nothing to batch, so its much more likely that you would exceed the maxContentLength
here and lowering the resultIterationBatchSize
wouldn't even help. You're only recourse would be to increase maxContentLength
or alter the query to allow batching to kick in to hopefully break up that large chunk of data to fit in the maxContentLength
.
Setting your maxContentLength
to 2mb or larger shouldn't be too big a deal. If you need to go higher for requests, then I'd be curious what the reason was for that. If you need to go much higher for responses, then perhaps I'd take a look at my queries and see if there's a better way to limit the data I'm returning or to see if there's a nicer way to get Gremlin Server streaming to work for me.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With