I'm prototyping a user-authorization/data-protection scheme in Neo4j, and I ran into a strange issue with one of my queries. For background, the concept is that a user trying to get from a to be can get to be if they have the correct access identifier. So, our edges are of types that have access identifiers in them. I'm testing this scheme by creating lots of nodes, and connecting pairs of them with different accesses. That is, I have lots of sets of:
(a)-[:ACCESS_A]->(b)
With different accesses. I query for them with:
{some query} with a match (a)-[:ACCESS_A|:ACCESS_B|<...>|:ACCESS_Z]->(b) return b
where the size of the list in the edge match grows with the number of accesses the user has.
This all works great, until the list gets to 201 accesses. At this point, the profile shows the db hits and time taken go WAY up. At 200 relationship types, the profile shows 1051 db hits, but 201 relationship types shows 31801. That's a 30-fold increase for one more type! Time taken increases in a similar manner. going from 199 to 200 only goes up by about 50 hits, and that's due to an increasing number of nodes hit.
After more work, it looks like the round 200 number is more a red herring than the issue. Previously, my relationship types were 4 characters. When I changed them to 9 characters (prepending "EDGE_", as a test), the issue began occurring at 50 types - 50 has 36 accesses, while 51 has 291 - a smaller jump, but significant when compared to previous increases in the same test.
It appears that there is some relation of relationship name to where the query falls off, but I'm still investigating.
Things that I've tested and not found to be of interest:
To my knowledge you shouldn't be running into performance issues with only 200 relationship types.
Prior to version 3.0, the number of relationship types was capped at 64k. That limit was removed with version 3.0.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With