Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Significant performance difference between neo4j direct access and via OGM

I am evaluating the performance of Neo4j graph database with a simple benchmark for insert, update, delete and query. Using Neo4j OGM I see significantly slower execution times (about 2-4 times) compared to the direct access via Neo4j driver. For example, delete operation (see code below) is done in 500ms vs 1200ms for 10K nodes and 11K relations on my machine. I wonder why this happens, especially because the below code for deletion doesn't even use any node entity. I can imagine that OGM has some overhead but this seems to be too much. Anyone has an idea why it's slower?

Example node:

public abstract class AbstractBaseNode {

    @GraphId
    @Index(unique = true)
    private Long id;

    public Long getId() {
        return id;
    }
}

@NodeEntity
public class Company extends AbstractBaseNode {

    private String name;

    public Company(String name) {
        this.name = name;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }
}

Example code for delete via driver:

driver = GraphDatabase.driver( "bolt://localhost:7687", AuthTokens.basic( "neo4j", "secret" ) );
session = driver.session();

long start = System.nanoTime();

session.run("MATCH (n) DETACH DELETE n").list();

System.out.println("Deleted all nodes " + ((System.nanoTime() - start) / 1000) + "μs");

Example code for delete via OGM:

public org.neo4j.ogm.config.Configuration neo4jConfiguration() {
    org.neo4j.ogm.config.Configuration config =  new org.neo4j.ogm.config.Configuration();
    config.autoIndexConfiguration().setAutoIndex(AutoIndexMode.DUMP.getName());
    config.driverConfiguration()
            .setDriverClassName("org.neo4j.ogm.drivers.bolt.driver.BoltDriver")
            .setURI("bolt://neo4j:secret@localhost")
            .setConnectionPoolSize(10);

    return config;
}

sessionFactory = new SessionFactory(neo4jConfiguration(), "net.mypackage");
session = sessionFactory.openSession();

long start = System.nanoTime();

session.query("MATCH (n) DETACH DELETE n", Collections.emptyMap()).forEach(x -> {});

System.out.println("Deleted all nodes " + ((System.nanoTime() - start) / 1000) + "μs");
like image 590
Steffen Harbich Avatar asked May 11 '17 09:05

Steffen Harbich


People also ask

How can I improve my Neo4j performance?

Heap Sizing The size of the available heap memory is an important aspect for the performance of Neo4j. Generally speaking, it is beneficial to configure a large enough heap space to sustain concurrent operations. For many setups, a heap size between 8G and 16G is large enough to run Neo4j reliably.

What is Neo4j OGM?

© 2022 Neo4j, Inc.

What protocol does a Java application use to access the Neo4j database?

The Neo4j Java driver is officially supported by Neo4j and connects to the database using the binary protocol.


1 Answers

I will start by pointing out your test samples are poor. When taking time sample, you want to stress the system so that it takes a fair amount of time. The tests should also test what your interested in (are you testing how fast you can create and drop connections? Max Cypher through put? Speed of single large transaction?) With tests that are barley a second, it is impossible to tell if difference in performance is the query call, or just startup overhead (despite the name, the session doesn't actually connect until you call query(...)).

As far as I can tell, both version perform about the same in a normal setup. The only thing I can think of that would affect this is if the OSGM was doing something to starve other processes of system resources.

UPDATE

UNWIND {rows} as row 
CREATE (n:Company) 
SET n=row.props 
RETURN row.nodeRef as ref, ID(n) as id, row.type as type with params {rows=[{nodeRef=-1206180304, type=node, props={name=company_1029}}]}

VS

CREATE (a:Company {name: {name}}) // X10,000

The key difference between the driver and the OGM is that the driver does exactly what you tell it to do, which is the most efficient way of doing things; and the OGM tries to manage the query logic for you (What to return, how to save things, what to try to save). And the OGM version is more reliable because it will automatically try to consolidate nodes to the database (if possible), and will only save things that have actually changed. Since your node class doesn't have a primary key to consolidate on, it will have to create everything. The OGM Cypher is more versatile, but it also requires more memory use/access. SET n.name="rawr" is 1 db hit per property. SET n={name:"rawr"} is 3 db hits though (about 1+2*#_of_props. {name:"rawr", id:2} is 5 db hits). That is why the OGM Cypher is slower. The OGM however has smart management though, so if you than edit one node and try to save it, the driver would have to either save all, or you would have to implement your own manager. The OGM will only save the updated one.

So in short, the OGM Cyphers are less efficient than what you would write using the driver, but the OGM has smart management built in that can make it faster than a blind driver implementation in real business logic situations (loading/editing large numbers of nodes). Of course, you can implement your own management with the driver to be faster, so it's a trade off of speed and development effort. The more speed you want, the more time you have to put into managing every tiny aspect (and the point of OGM is to plug it in and it just works).

like image 171
Tezra Avatar answered Nov 08 '22 17:11

Tezra