Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Normal JSON to GraphSON format

I have two questions:

  1. Where I can actually find the basic format for a GraphSON file, that is guaranteed to be successfully loaded by the gremlin console? I'm trying to convert a JSON (with about 10-20 fields) to another file that can be queried by gremlin but I can't actually find any relevant information about the fields reserved by the graphson format or how I should handle the IDs etc. I exported the modern graph they provide and it's not even a valid JSON (Multiple JSON root elements), but a list of JSONs [1] I also saw fields like outE, inE...are these fields something I manually have to create?

  2. If I am able to create the JSON, where do I tell the server to load it as the base graph when I start it? In the config file or in the script?

Thanks! Adrian

[1] https://pastebin.com/drwXhg5k

{"id":1,"label":"person","outE":{"created":[{"id":9,"inV":3,"properties":{"weight":0.4}}],"knows":[{"id":7,"inV":2,"properties":{"weight":0.5}},{"id":8,"inV":4,"properties":{"weight":1.0}}]},"properties":{"name":[{"id":0,"value":"marko"}],"age":[{"id":1,"value":29}]}}
{"id":2,"label":"person","inE":{"knows":[{"id":7,"outV":1,"properties":{"weight":0.5}}]},"properties":{"name":[{"id":2,"value":"vadas"}],"age":[{"id":3,"value":27}]}}
{"id":3,"label":"software","inE":{"created":[{"id":9,"outV":1,"properties":{"weight":0.4}},{"id":11,"outV":4,"properties":{"weight":0.4}},{"id":12,"outV":6,"properties":{"weight":0.2}}]},"properties":{"name":[{"id":4,"value":"lop"}],"lang":[{"id":5,"value":"java"}]}}
{"id":4,"label":"person","inE":{"knows":[{"id":8,"outV":1,"properties":{"weight":1.0}}]},"outE":{"created":[{"id":10,"inV":5,"properties":{"weight":1.0}},{"id":11,"inV":3,"properties":{"weight":0.4}}]},"properties":{"name":[{"id":6,"value":"josh"}],"age":[{"id":7,"value":32}]}}
{"id":5,"label":"software","inE":{"created":[{"id":10,"outV":4,"properties":{"weight":1.0}}]},"properties":{"name":[{"id":8,"value":"ripple"}],"lang":[{"id":9,"value":"java"}]}}
{"id":6,"label":"person","outE":{"created":[{"id":12,"inV":3,"properties":{"weight":0.2}}]},"properties":{"name":[{"id":10,"value":"peter"}],"age":[{"id":11,"value":35}]}}
like image 390
Adrian Pop Avatar asked Jul 05 '17 13:07

Adrian Pop


1 Answers

Where I can actually find the basic format for a GraphSON file, that is guaranteed to be successfully loaded by the gremlin console?

There are multiple versions of GraphSON at this point. You can get a reference in the Apache TinkerPop IO Documentation. When you write, "successfully loaded by the gremlin console" I assume that you mean with the GraphSONReader methods described here. Of so, then the format you show above is one form you can use. It is not valid JSON as you can see, though you can build the reader/writer with the wrapAdjacencyList option set to true and it will produce valid JSON. Here is an example:

gremlin> graph = TinkerFactory.createModern();
==>tinkergraph[vertices:6 edges:6]
gremlin> writer =  graph.io(IoCore.graphson()).writer().wrapAdjacencyList(true).create()
==>org.apache.tinkerpop.gremlin.structure.io.graphson.GraphSONWriter@24a298a6
gremlin> os = new FileOutputStream('wrapped-adjacency-list.json')
==>java.io.FileOutputStream@6d3c232f
gremlin> writer.writeGraph(os, graph)
gremlin> os.close()
gremlin> newGraph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> ins = new FileInputStream('wrapped-adjacency-list.json')
==>java.io.FileInputStream@7435a578
gremlin> reader = graph.io(IoCore.graphson()).reader().unwrapAdjacencyList(true).create()
==>org.apache.tinkerpop.gremlin.structure.io.graphson.GraphSONReader@63da207f
gremlin> reader.readGraph(ins, newGraph)
gremlin> newGraph
==>tinkergraph[vertices:6 edges:6]

The reason you do not get valid JSON by default is because the standard format for a GraphSON file needs to be splittable for Hadoop and other distributed processing engines. Therefore it produces one line per vertex - a StarGraph format.

If I am able to create the JSON, where do I tell the server to load it as the base graph when I start it? In the config file or in the script?

A script would work. as would the gremlin.tinkergraph.graphLocation and gremlin.tinkergraph.graphFormat configuration options on TinkerGraph.

Ultimately though, if you have existing JSON and you aren't loading tens of millions of graph elements, it is probably easiest to just parse it and use standard g.addV() and g.addE() methods to build the graph:

gremlin> import groovy.json.*
==>org.apache.tinkerpop.gremlin.structure.*,...
gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> jsonSlurper = new JsonSlurper()
==>groovy.json.JsonSlurper@53e3a87a
gremlin> object = jsonSlurper.parseText('[{ "name": "John Doe" }, { "name" : "Jane Doe" }]')
==>[name:John Doe]
==>[name:Jane Doe]
gremlin> object.each {g.addV('name',it.name).iterate() }
==>[name:John Doe]
==>[name:Jane Doe]
gremlin> g.V().valueMap()
==>[name:[John Doe]]
==>[name:[Jane Doe]]

Trying to convert that to GraphSON is overly complicated compared to the approach above.

like image 179
stephen mallette Avatar answered Nov 08 '22 11:11

stephen mallette