What is good way to import a directory/file structure in Neo4j from CSV file?

Tags:

I am looking to import a lot of filenames into a graph database, using Neo4j. The data is from an external source and available in CSV file. I'd like to create a tree structure from the data, so that I can easily 'navigate' the structure in queries later on (i.e. find all files underneath a certain directory, all file that occur in multiple directories etc.).

So, given the example input:

/foo/bar/example.txt
/bar/baz/another.csv
/example.txt
/foo/bar/onemore.txt

I'd like the create the following graph:

( / ) <-[:in]- ( foo ) <-[:in]- ( bar ) <-[:in]- ( example.txt )
                                        <-[:in]- ( onemore.txt )
      <-[:in]- ( bar ) <-[:in]- ( baz ) <-[:in]- ( another.csv )
      <-[:in]- ( example.txt )

(where each node label is actually an attribute, e.g. path:).

I've been able to achieve the desired effect when using a fixed number of directory levels; for example when each file is at three levels deep, I could create a CSV file with 4 columns:

dir_a,dir_b,dir_c,file
foo,bar,baz,example.txt
foo,bar,ban,example.csv
foo,bar,baz,another.txt

And import it using a cypher query:

LOAD CSV WITH HEADERS FROM "file:///sample.csv" AS row
  MERGE (dir_a:Path {name: row.dir_a})
  MERGE (dir_b:Path {name: row.dir_b}) <-[:in]- (dir_a)
  MERGE (dir_c:Path {name: row.dir_c}) <-[:in]- (dir_b)
  MERGE      (:Path {name: row.file})  <-[:in]- (dir_c)

But I'd like to have a general solution that works for any level of sub-directories (and combinations of levels in one dataset). Note that I am able to pre-process my input if necessary, so I can create any desirable structure in the input CSV file.

I've looked at gists or plugins, but cannot seem to find anything that works. I think/hope that I should be able to do something with the split() function, i.e. use split('/',row.path) to get a list of path elements, but I do not know how to process this list into a chain of MERGE operations.

879

asked Jul 28 '16 15:07

Remco van Engelen

1 Answers

Here is a first cut at something more generalized.

The premise is that you can split the fully qualified path into components and then use each component of it to split the path so you can struct the fully qualified path for each individual component of the larger path. Use this as the key to merge items on and set the individual component after they are merged. In the case that something is not the root level then find the parent of an individual component and create the relationship back to it. This will break down if there are duplicate component names in a fully qualified path.

First, i started by creating a uniqueness constraint on fq_path

create constraint on (c:Component) assert c.fq_path is unique;

Here is the load statement.

load csv from 'file:///path.csv' as line
with line[0] as line, split(line[0],'/') as path_components
unwind range(0, size(path_components)-1) as idx
with case 
       when idx = 0 then '/'
     else
       path_components[idx]
     end as component
   , case 
       when idx = 0 then '/'
     else
       split(line, path_components[idx])[0] + path_components[idx]
     end as fq_path
   , case 
       when idx = 0 then
         null
       when idx = 1 then
         '/'
     else
       substring(split(line, path_components[idx])[0],0,size(split(line, path_components[idx])[0])-1)
     end as parent
   , case 
       when idx = 0 then
         []
       else
         [1]
     end as find_parent
merge (new_comp:Component {fq_path: fq_path})
set new_comp.name = component
foreach ( y in find_parent |
  merge (theparent:Component {fq_path: parent} )
  merge (theparent)<-[:IN]-(new_comp)
)     
return *

If you want to differentiate between files and folders here are a few queries you can run afterwards to set another label on the respective nodes.

Find the files and set them as File

// find the last Components in a tree (no inbound IN)
// and set them as Files
match (c:Component)
where not (c)<-[:IN]-(:Component)
set c:File
return c

Find the folders and set them as Folder

// find all Components with an inbound IN
// and set them as Folders
match (c:Component)
where  (c)<-[:IN]-(:Component)
set c:Folder
return c

answered Oct 21 '22 09:10

Dave Bennett

Related questions
                            
                                How to get Beyond Compare to compare data files matching columns by name, not by order
                            
                                Importing CSV file into multiple models at one time
                            
                                Fastest way to extract only certain fields from comma separated string in Python
                            
                                Rigorous definition for CSV file reading/writing
                            
                                How to write a list with a nested dictionary to a csv file?
                            
                                CSV dialect in pandas DataFrame to_csv (python)
                            
                                How to import data via alembic (from a CSV file)?
                            
                                Download file with R given a JavaScript Statement
                            
                                Upload File parameter not coming through to controller
                            
                                Python adding a blank/empty column. csv
                            
                                Python CSV read file and select columns and write to new CSV file
                            
                                Convert (xls, xlsx) to CSV before Upload using PHP
                            
                                What's the best way to create RFC-4180-friendly CSV files from Amazon Redshift UNLOAD?
                            
                                Javascript Read Mime Type CSV Always Empty

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is good way to import a directory/file structure in Neo4j from CSV file?

Tags:

import

csv

directory-structure

neo4j

cypher

Remco van Engelen

People also ask

1 Answers

Dave Bennett

Recent Activity

Donate For Us