Getting results on a pandas dataframe from a cypher query on a Neo4j database with py2neo is really straightforward, as:
>>> from pandas import DataFrame
>>> DataFrame(graph.data("MATCH (a:Person) RETURN a.name, a.born LIMIT 4"))
a.born a.name
0 1964 Keanu Reeves
1 1967 Carrie-Anne Moss
2 1961 Laurence Fishburne
3 1960 Hugo Weaving
Now I am trying to create (or better MERGE) a set of nodes and relationships from a pandas dataframe into a Neo4j database with py2neo. Imagine I have a dataframe like:
LABEL1 LABEL2
p1 n1
p2 n1
p3 n2
p4 n2
where Labels are column header and properties as values. I would like to reproduce the following cypher query (for the first row as example), for every rows of my dataframe:
query="""
MATCH (a:Label1 {property:p1))
MERGE (a)-[r:R_TYPE]->(b:Label2 {property:n1))
"""
I know I can tell py2neo just to graph.run(query)
, or even run a LOAD CSV
cypher script in the same way, but I wonder whether I can iterate through the dataframe and apply the above query row by row WITHIN py2neo.
In Neo4j to create relationship between nodes you have to use the CREATE statement like we used to create nodes. lets create relation between two already created nodes.
cypher. execute() - Stack Overflow.
Properties are stored as a linked list of property records, each holding a key and value and pointing to the next property. Each node and relationship references its first property record. The Nodes also reference the first relationship in its relationship chain. Each Relationship references its start and end node.
The Neo4j Graph Data Science Library provides multiple operations to work with relationships and their properties stored in a projected graphs. Relationship properties are either added during the graph projection or when using the mutate mode of our graph algorithms.
You can use DataFrame.iterrows()
to iterate through the DataFrame and execute a query for each row, passing in the values from the row as parameters.
for index, row in df.iterrows():
graph.run('''
MATCH (a:Label1 {property:$label1})
MERGE (a)-[r:R_TYPE]->(b:Label2 {property:$label2})
''', parameters = {'label1': row['label1'], 'label2': row['label2']})
That will execute one transaction per row. We can batch multiple queries into one transaction for better performance.
tx = graph.begin()
for index, row in df.iterrows():
tx.evaluate('''
MATCH (a:Label1 {property:$label1})
MERGE (a)-[r:R_TYPE]->(b:Label2 {property:$label2})
''', parameters = {'label1': row['label1'], 'label2': row['label2']})
tx.commit()
Typically we can batch ~20k database operations in a single transaction.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With