Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use OrientDB ETL to create edges only

I have two CSV files:

First containing ~ 500M records in the following format

id,name
10000023432,Tom User
13943423235,Blah Person

Second containing ~ 1.5B friend relationships in the following format

fromId,toId
10000023432,13943423235

I used OrientDB ETL tool to create vertices from the first CSV file. Now, I just need to create edges to establish friendship connection between them.

I have tried multiple configuration of the ETL json file so far, the latest being this one:

{
    "config": {"parallel": true},
    "source": { "file": { "path": "path_to_file" } },
    "extractor": { "csv": {} },
    "transformers": [
        { "vertex": {"class": "Person", "skipDuplicates": true} },
        { "edge": { "class": "FriendsWith",
                    "joinFieldName": "from",
                    "lookup": "Person.id",
                    "unresolvedLinkAction": "SKIP",
                    "targetVertexFields":{
                        "id": "${input.to}"
                    },
                    "direction": "out"
                  }
        },
        { "code": { "language": "Javascript",
                    "code": "print('Current record: ' + record);  record;"}
        }
    ],
    "loader": {
        "orientdb": {
            "dbURL": "remote:<DB connection string>",
            "dbType": "graph",
            "classes": [
                {"name": "FriendsWith", "extends": "E"}
            ], "indexes": [
                {"class":"Person", "fields":["id:long"], "type":"UNIQUE" }
            ]
        }
    }
}

But unfortunately, this also creates the vertex with "from" and "to" property, in addition to creating the edge.

When I try removing the vertex transformer, ETL process throws an error:

Error in Pipeline execution: com.orientechnologies.orient.etl.transformer.OTransformException: edge: input type 'com.orientechnologies.orient.core.record.impl.ODocument$1$1@40d13
6a8' is not supported
Exception in thread "OrientDB ETL pipeline-0" com.orientechnologies.orient.etl.OETLProcessHaltedException: Halt
        at com.orientechnologies.orient.etl.OETLPipeline.execute(OETLPipeline.java:149)
        at com.orientechnologies.orient.etl.OETLProcessor$2.run(OETLProcessor.java:341)
        at java.lang.Thread.run(Thread.java:745)
Caused by: com.orientechnologies.orient.etl.transformer.OTransformException: edge: input type 'com.orientechnologies.orient.core.record.impl.ODocument$1$1@40d136a8' is not suppor
ted
        at com.orientechnologies.orient.etl.transformer.OEdgeTransformer.executeTransform(OEdgeTransformer.java:107)
        at com.orientechnologies.orient.etl.transformer.OAbstractTransformer.transform(OAbstractTransformer.java:37)
        at com.orientechnologies.orient.etl.OETLPipeline.execute(OETLPipeline.java:115)
        ... 2 more

What am I missing here?

like image 768
lambdapilgrim Avatar asked Nov 12 '15 19:11

lambdapilgrim


1 Answers

You can import the edges with these ETL transformers:

"transformers": [
    { "merge": { "joinFieldName": "fromId", "lookup": "Person.id" } },
    { "vertex": {"class": "Person", "skipDuplicates": true} },
    { "edge": { "class": "FriendsWith",
                "joinFieldName": "toId",
                "lookup": "Person.id",
                "direction": "out"
              }
    },
    { "field": { "fieldNames": ["fromId", "toId"], "operation": "remove" } }
]

The "merge" transformer will join the current csv line with related Person record (this is a bit strange but for some reason this is neccessary to associate fromId with the source person).

The "field" transformer will remove the csv fields added by the merge section. You can try the import without "field" transformer as well to see the difference.

like image 110
K.Roland Avatar answered Oct 10 '22 20:10

K.Roland