The popular graph database Neo4j
can be used within R
thanks to the package/driver RNeo4j
(https://github.com/nicolewhite/Rneo4j).
The package author, @NicoleWhite, provides several great examples of its usage on GitHub.
Unfortunately for me, the examples given by @NicoleWhite and the documentation are a bit oversimplistic, in that they manually create each graph node and its associated labels
and properties
, such as:
mugshots = createNode(graph, "Bar", name = "Mugshots", location = "Downtown")
parlor = createNode(graph, "Bar", name = "The Parlor", location = "Hyde Park")
nicole = createNode(graph, name = "Nicole", status = "Student")
addLabel(nicole, "Person")
That's all good and fine when you're dealing with a tiny example dataset, but this approach isn't feasible for something like a large social graph with thousands of users, where each user is a node (such graphs might not utilize every node in every query, but they still need to be input to Neo4j
).
I'm trying to figure out how to do this using vectors or dataframes. Is there a solution, perhaps invoving an apply
statement or for
loop?
This basic attempt:
for (i in 1:length(df$user_id)){
paste(df$user_id[i]) = createNode(graph, "user", name = df$name[i], email = df$email[i])
}
Leads to Error: 400 Bad Request
A label in Neo4j is used to group (classify) the nodes using labels. You can create a label for a node in Neo4j using the CREATE clause. Following is the syntax for creating a node with a label using Cypher Query Language. Following is a sample Cypher Query which creates a node with a label.
Following is the syntax to return a node in Neo4j. CREATE (Node:Label {properties. . . . }) RETURN Node Following is a sample Cypher Query which creates a node with properties and returns it.
You can create a node in Neo4j by simply specifying the name of the node that is to be created along with the CREATE clause. Following is the syntax for creating a node using Cypher Query Language. Note − Semicolon (;) is optional. Following is a sample Cypher Query which creates a node in Neo4j.
In Neo4j, the CREATE statement is used to create a node. You can create the following things by using CREATE statement: To create a single node in Neo4j, specify the name of the node along with CREATE statement. Note: You can add or ignore semicolon (;).
As a first attempt, you should look at the functionality I just added for the transactional endpoint:
http://nicolewhite.github.io/RNeo4j/docs/transactions.html
library(RNeo4j)
graph = startGraph("http://localhost:7474/db/data/")
clear(graph)
data = data.frame(Origin = c("SFO", "AUS", "MCI"),
FlightNum = c(1, 2, 3),
Destination = c("PDX", "MCI", "LGA"))
query = "
MERGE (origin:Airport {name:{origin_name}})
MERGE (destination:Airport {name:{dest_name}})
CREATE (origin)<-[:ORIGIN]-(:Flight {number:{flight_num}})-[:DESTINATION]->(destination)
"
t = newTransaction(graph)
for (i in 1:nrow(data)) {
origin_name = data[i, ]$Origin
dest_name = data[i, ]$Dest
flight_num = data[i, ]$FlightNum
appendCypher(t,
query,
origin_name = origin_name,
dest_name = dest_name,
flight_num = flight_num)
}
commit(t)
cypher(graph, "MATCH (o:Airport)<-[:ORIGIN]-(f:Flight)-[:DESTINATION]->(d:Airport)
RETURN o.name, f.number, d.name")
Here, I form a Cypher query and then loop through a data frame and pass the values as parameters to the Cypher query. Your attempts right now will be slow, because you're sending a separate HTTP request for each node created. By using the transactional endpoint, you create several things under a single transaction. If your data frame is very large, I would split it up into roughly 1000 rows per transaction.
As a second attempt, you should consider using LOAD CSV in the neo4j-shell.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With