I have two CSV files:
First containing ~ 500M records in the following format
id,name
10000023432,Tom User
13943423235,Blah Person
Second containing ~ 1.5B friend relationships in the following format
fromId,toId
10000023432,13943423235
I used OrientDB ETL tool to create vertices from the first CSV file. Now, I just need to create edges to establish friendship connection between them.
I have tried multiple configuration of the ETL json file so far, the latest being this one:
{
"config": {"parallel": true},
"source": { "file": { "path": "path_to_file" } },
"extractor": { "csv": {} },
"transformers": [
{ "vertex": {"class": "Person", "skipDuplicates": true} },
{ "edge": { "class": "FriendsWith",
"joinFieldName": "from",
"lookup": "Person.id",
"unresolvedLinkAction": "SKIP",
"targetVertexFields":{
"id": "${input.to}"
},
"direction": "out"
}
},
{ "code": { "language": "Javascript",
"code": "print('Current record: ' + record); record;"}
}
],
"loader": {
"orientdb": {
"dbURL": "remote:<DB connection string>",
"dbType": "graph",
"classes": [
{"name": "FriendsWith", "extends": "E"}
], "indexes": [
{"class":"Person", "fields":["id:long"], "type":"UNIQUE" }
]
}
}
}
But unfortunately, this also creates the vertex with "from" and "to" property, in addition to creating the edge.
When I try removing the vertex transformer, ETL process throws an error:
Error in Pipeline execution: com.orientechnologies.orient.etl.transformer.OTransformException: edge: input type 'com.orientechnologies.orient.core.record.impl.ODocument$1$1@40d13
6a8' is not supported
Exception in thread "OrientDB ETL pipeline-0" com.orientechnologies.orient.etl.OETLProcessHaltedException: Halt
at com.orientechnologies.orient.etl.OETLPipeline.execute(OETLPipeline.java:149)
at com.orientechnologies.orient.etl.OETLProcessor$2.run(OETLProcessor.java:341)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.orientechnologies.orient.etl.transformer.OTransformException: edge: input type 'com.orientechnologies.orient.core.record.impl.ODocument$1$1@40d136a8' is not suppor
ted
at com.orientechnologies.orient.etl.transformer.OEdgeTransformer.executeTransform(OEdgeTransformer.java:107)
at com.orientechnologies.orient.etl.transformer.OAbstractTransformer.transform(OAbstractTransformer.java:37)
at com.orientechnologies.orient.etl.OETLPipeline.execute(OETLPipeline.java:115)
... 2 more
What am I missing here?
You can import the edges with these ETL transformers:
"transformers": [
{ "merge": { "joinFieldName": "fromId", "lookup": "Person.id" } },
{ "vertex": {"class": "Person", "skipDuplicates": true} },
{ "edge": { "class": "FriendsWith",
"joinFieldName": "toId",
"lookup": "Person.id",
"direction": "out"
}
},
{ "field": { "fieldNames": ["fromId", "toId"], "operation": "remove" } }
]
The "merge" transformer will join the current csv line with related Person record (this is a bit strange but for some reason this is neccessary to associate fromId with the source person).
The "field" transformer will remove the csv fields added by the merge section. You can try the import without "field" transformer as well to see the difference.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With