Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Failed WriteBatch Operation with py2neo

I am trying to find a workaround to the following problem. I have seen it quasi-described in this SO question, yet not really answered.

The following code fails, starting with a fresh graph:

from py2neo import neo4j

def add_test_nodes():
    # Add a test node manually
    alice = g.get_or_create_indexed_node("Users", "user_id", 12345, {"user_id":12345})

def do_batch(graph):
    # Begin batch write transaction
    batch = neo4j.WriteBatch(graph)

    # get some updated node properties to add
    new_node_data = {"user_id":12345, "name": "Alice"}

    # batch requests
    a = batch.get_or_create_in_index(neo4j.Node, "Users", "user_id", 12345, {})
    batch.set_properties(a, new_node_data)  #<-- I'm the problem

    # execute batch requests and clear
    batch.run()
    batch.clear()

if __name__ == '__main__':
    # Initialize Graph DB service and create a Users node index
    g = neo4j.GraphDatabaseService()
    users_idx = g.get_or_create_index(neo4j.Node, "Users")

    # run the test functions
    add_test_nodes()
    alice = g.get_or_create_indexed_node("Users", "user_id", 12345)
    print alice

    do_batch(g)

    # get alice back and assert additional properties were added
    alice = g.get_or_create_indexed_node("Users", "user_id", 12345)
    assert "name" in alice

In short, I wish, in one batch transaction, to update existing indexed node properties. The failure is occurring at the batch.set_properties line, and it is because the BatchRequest object returned by the previous line is not being interpreted as a valid node. Though not entirely indentical, it feels like I am attempting something like the answer posted here

Some specifics

>>> import py2neo
>>> py2neo.__version__
'1.6.0'
>>> g = py2neo.neo4j.GraphDatabaseService()
>>> g.neo4j_version
(2, 0, 0, u'M06') 

Update

If I split the problem into separate batches, then it can run without error:

def do_batch(graph):
    # Begin batch write transaction
    batch = neo4j.WriteBatch(graph)

    # get some updated node properties to add
    new_node_data = {"user_id":12345, "name": "Alice"}

    # batch request 1
    batch.get_or_create_in_index(neo4j.Node, "Users", "user_id", 12345, {})

    # execute batch request and clear
    alice = batch.submit()
    batch.clear()

    # batch request 2
    batch.set_properties(a, new_node_data)

    # execute batch request and clear
    batch.run()
    batch.clear()

This works for many nodes as well. Though I do not love the idea of splitting the batch up, this might be the only way at the moment. Anyone have some comments on this?

like image 348
SunPowered Avatar asked Nov 15 '13 21:11

SunPowered


2 Answers

After reading up on all the new features of Neo4j 2.0.0-M06, it seems that the older workflow of node and relationship indexes are being superseded. There is presently a bit of a divergence on the part of neo in the way indexing is done. Namely, labels and schema indexes.

Labels

Labels can be arbitrarily attached to nodes and can serve as a reference for an index.

Indexes

Indexes can be created in Cypher by referencing Labels (here, User) and node property key, (screen_name):

CREATE INDEX ON :User(screen_name)

Cypher MERGE

Furthermore, the indexed get_or_create methods are now possible via the new cypher MERGE function, which incorporate Labels and their indexes quite succinctly:

MERGE (me:User{screen_name:"SunPowered"}) RETURN me

Batch

Queries of the sort can be batched in py2neo by appending a CypherQuery instance to the batch object:

from py2neo import neo4j

graph_db = neo4j.GraphDatabaseService()
cypher_merge_user = neo4j.CypherQuery(graph_db, 
    "MERGE (user:User {screen_name:{name}}) RETURN user")

def get_or_create_user(screen_name):
    """Return the user if exists, create one if not"""
    return cypher_merge_user.execute_one(name=screen_name)

def get_or_create_users(screen_names):
    """Apply the get or create user cypher query to many usernames in a 
    batch transaction"""
    
    batch = neo4j.WriteBatch(graph_db)
    
    for screen_name in screen_names:
        batch.append_cypher(cypher_merge_user, params=dict(name=screen_name))

    return batch.submit()

root = get_or_create_user("Root")
users = get_or_create_users(["alice", "bob", "charlie"])

Limitation

There is a limitation, however, in that the results from a cypher query in a batch transaction cannot be referenced later in the same transaction. The original question was in reference to updating a collection of indexed user properties in one batch transaction. This is still not possible, as far as I can muster. For example, the following snippet throws an error:

batch = neo4j.WriteBatch(graph_db)
b1 = batch.append_cypher(cypher_merge_user, params=dict(name="Alice"))
batch.set_properties(b1, dict(last_name="Smith")})
resp = batch.submit()

So, it seems that although there is a bit less overhead in implementing the get_or_create over a labelled node using py2neo because the legacy indexes are no longer necessary, the original question still needs 2 separate batch transactions to complete.

like image 132
SunPowered Avatar answered Sep 20 '22 11:09

SunPowered


Your problem seems not to be in batch.set_properties() but rather in the output of batch.get_or_create_in_index(). If you add the node with batch.create(), it works:

db = neo4j.GraphDatabaseService()

batch = neo4j.WriteBatch(db)
# create a node instead of getting it from index
test_node = batch.create({'key': 'value'})
# set new properties on the node
batch.set_properties(test_node, {'key': 'foo'})

batch.submit()

If you have a look at the properties of the BatchRequest object returned by batch.create() and batch.get_or_create_in_index() there is a difference in the URI because the methods use different parts of the neo4j REST API:

test_node = batch.create({'key': 'value'})
print test_node.uri # node
print test_node.body # {'key': 'value'}
print test_node.method # POST

index_node = batch.get_or_create_in_index(neo4j.Node, "Users", "user_id", 12345, {})
print index_node.uri # index/node/Users?uniqueness=get_or_create
print index_node.body # {u'value': 12345, u'key': 'user_id', u'properties': {}}
print index_node.method # POST

batch.submit()

So I guess batch.set_properties() somehow can't handle the URI of the indexed node? I.e. it doesn't really get the correct URI for the node?

Doesn't solve the problem, but could be a pointer for somebody else ;) ?

like image 45
Martin Preusse Avatar answered Sep 22 '22 11:09

Martin Preusse