When user enters data in a text box, many possibilities of SQL Injection are observed. To prevent this, many methods are available to have placeholders in the SQL query, which are replaced in the next step of code by the input. Similarly, how can we prevent Gremlin Injection in C#?
Example: The following is a sample code for adding a node in a graph database. The value of variables: name and nodeId is taken from user via a text box.
StringBuilder sb = new StringBuilder();
sb.Append("g.addV('" + name + "').property('id','"+nodeId+"')");
/*The following simply executes the gremlin query stored in sb*/
IDocumentQuery<dynamic> query = client.CreateGremlinQuery<dynamic>(graph, sb.ToString());
while (query.HasMoreResults){
foreach (dynamic result in await query.ExecuteNextAsync())
{
Console.WriteLine($"\t {JsonConvert.SerializeObject(result)}");
}}
A malicious user may write the attributeValue like
name: "person" (without quotes)
id: "mary');g.V().drop();g.addV('person').property('id', 'thomas" (without quotes)
This will clear all the existing nodes and add only one node with the id: thomas
How do I prevent this from happening?
I don't wish to blacklist characters like ";" or ")" as this is permissible as input for some data.
Note: Gremlin is a traversal language used in graph databases:
https://tinkerpop.apache.org/gremlin.html
https://docs.microsoft.com/en-us/azure/cosmos-db/gremlin-support
The question was originally about Gremlin injections for cases where the Gremlin traversal was sent to the server (e.g., Gremlin Server) in the form of a query script. My original answer for this scenario can be found below (Gremlin Scripts). However, by now Gremlin Language Variants are the dominant way to execute Gremlin traversals which is why I extended my answer for them because it is very different than for the case of simple Gremlin scripts.
Gremlin Language Variants (GLVs) are implementations of Gremlin within different host languages like Python, JavaScript, or C#. This means that instead of sending the traversal as a string to the server like
client.SubmitAsync<object>("g.V().count");
it can simply be represented as code in the specific language and then executed with a special terminal step (like next()
or iterate()
):
g.V().Count().Next();
This builds and executes the traversal in C# (it would look basically the same in other languages, just not with the step names in pascal case). The traversal will be converted into Gremlin Bytecode which is the language-independent representation of a Gremlin traversal. This Bytecode will then be serialized to GraphSON to be sent to a server for evaluation:
{
"@type" : "g:Bytecode",
"@value" : {
"step" : [ [ "V" ], [ "count" ] ]
}
}
This very simple traversal already shows that GraphSON includes type information, especially since version 2.0 and more so in version 3.0 which is the default version since TinkerPop 3.3.0.
There are two interesting GraphSON types for an attacker, namely the already showed Bytecode which can be used to execute Gremlin traversals like g.V().drop
to manipulate / remove data from the graph and g:Lambda
which can be used to execute arbitrary code1:
{
"@type" : "g:Lambda",
"@value" : {
"script" : "{ it.get() }",
"language" : "gremlin-groovy",
"arguments" : 1
}
}
However, an attacker would need to add either his own Bytecode or a lambda as an argument to a step that is part of an existing traversal. Since a string would simply be serialized as a string in GraphSON no matter whether it contains something that represents a lambda or Bytecode, it is not possible to inject code into a Gremlin traversal with a GLV this way. The code would simply be treated as a string. The only way this could work is when the attacker would be able to provide a Bytecode or Lambda object directly to the step, but I can't think of any scenario that would allow for this.
So, to my best knowledge, injecting code into a Gremlin traversal is not possible when a GLV is used. This is independent of the fact whether bindings are used or not.
The following part was the original answer for scenarios where the traversal is sent as a query string to the server:
Your example will indeed result in something you could call a Gremlin injection. I tested it with Gremlin.Net, but it should work the same way with any Gremlin driver. Here is the test that demonstrates that the injection actually works:
var gremlinServer = new GremlinServer("localhost");
using (var gremlinClient = new GremlinClient(gremlinServer))
{
var name = "person";
var nodeId = "mary').next();g.V().drop().iterate();g.V().has('id', 'thomas";
var query = "g.addV('" + name + "').property('id','" + nodeId + "')";
await gremlinClient.SubmitAsync<object>(query);
var count = await gremlinClient.SubmitWithSingleResultAsync<long>(
"g.V().count().next()");
Assert.NotEqual(0, count);
}
This test fails because count
is 0
which shows that the Gremlin Server executed the g.V().drop().iterate()
traversal.
Now the official TinkerPop documentation recommends to use script parameterization instead of simply including the parameters directly in the query script like we did in the previous example. While it motivates this recommendation with performance improvements, it also helps to prevent injections by malicious user input. To understand the effect of script parameterization here, we have to take a look at how a request is sent to the Gremlin Server (taken from the Provider Documentation):
{ "requestId":"1d6d02bd-8e56-421d-9438-3bd6d0079ff1",
"op":"eval",
"processor":"",
"args":{"gremlin":"g.traversal().V(x).out()",
"bindings":{"x":1},
"language":"gremlin-groovy"}}
As we can see in this JSON representation of a request message, the arguments of a Gremlin script are sent separated from the script itself as bindings. (The argument is named x
here and has the value 1
.)
The important thing here is that the Gremlin Server will only execute the script from the gremlin
element and then include the parameters from the bindings
element as raw values.
A simple test to see that using bindings prevents the injection:
var gremlinServer = new GremlinServer("localhost");
using (var gremlinClient = new GremlinClient(gremlinServer))
{
var name = "person";
var nodeId = "mary').next();g.V().drop().iterate();g.V().has('id', 'thomas";
var query = "g.addV('" + name + "').property('id', nodeId)";
var arguments = new Dictionary<string, object>
{
{"nodeId", nodeId}
};
await gremlinClient.SubmitAsync<object>(query, arguments);
var count = await gremlinClient.SubmitWithSingleResultAsync<long>(
"g.V().count().next()");
Assert.NotEqual(0, count);
var existQuery = $"g.V().has('{name}', 'id', nodeId).values('id');";
var nodeIdInDb = await gremlinClient.SubmitWithSingleResultAsync<string>(existQuery,
arguments);
Assert.Equal(nodeId, nodeIdInDb);
}
This test passes which not only shows that g.V().drop()
was not executed (otherwise count
would again have the value 0
), but it also demonstrates in the last three lines that the injected Gremlin script was simply used as the value of the id
property.
1 This arbitrary code execution is actually provider specific. Some providers like Amazon Neptune for example don't support lambdas at all and it is also possible to restrict the code that can be executed with a SandboxExtension for the Gremlin Server, e.g., by blacklisting known problematic methods with the SimpleSandboxExtension or by whitelisting only known unproblematic methods with the FileSandboxExtension.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With