Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ID as int in neo4j bulk import produces error in relationships import

Tags:

import

csv

neo4j

I use the admin-import tool of Neo4j to import bulk data in csv format. I use Integer as ID datatype in the header [journal:ID:int(Journal-ID)] and the part of importing the nodes works fine. When the import-tool comes to the relationships, I get the error that the referring node is missing. Seems like the relations-import it is searching the ID in String format. I already tried to change the type of the ID in the relations File as well, but get an other error. I found no way to specify the ID as int in the relations-File.

Here is an minimal example. Lets say we have two node types with the headers:

journal:ID:int(Journal-ID)

and

documentID:ID(Document-ID),title

and the example files journal.csv:

"123"
"987"

and document.csv:

"PMID:1", "Title"
"PMID:2", "Other Title"

We also have a relation "hasDocument" with the header:

:START_ID(Journal-ID),:END_ID(Document-ID)

and the example file relation.csv:

"123", "PMID:1"

When running the import I get the the error:

Error in input data
Caused by:123 (Journal-ID)-[hasDocument]->PMID:1 (Document-ID) referring to missing node 123

I tried to specify the relation header as

:START_ID:int(Journal-ID),:END_ID(Document-ID)

but this also produces an error.

The command to start the import is:

neo4j-admin import --nodes:Document="document-header.csv,documentNodes.csv" --nodes:Journal="journal-header.csv,journalNodes.csv" --relationships:hasDocument="hasDocument-header.csv,relationsHasDocument.csv"

Is there a way to specify the ID in the relation file as Integer or is there an other solution to that problem?

like image 729
Andi Avatar asked Dec 04 '25 16:12

Andi


1 Answers

It doesn't seem to be supported. The documentation doesn't mention it and the code doesn't have such test case.

You could import the data with String ids and cast it after you start the database.

MATCH (j:Journal)
SET j.id = toInteger(j.id)

If your dataset is large you can use apoc with iterate:

call apoc.periodic.iterate("
MATCH (j:Journal) RETURN j
","
SET j.id = toInteger(j.id)
",{batchSize:10000})
like image 74
František Hartman Avatar answered Dec 06 '25 07:12

František Hartman