I am using Neo4j as my primary database. It is a nice piece of graph database that gives me very good control over connections between nodes. However, it seems to lack highly in searching full text (search feature on a website). And therefore, I am thinking of using Elasticsearch for creating search feature on my application. But there are few issues while doing this. Let's say we are searching for user posts. In neo4j posts could have the following model.
(post)<-[:AUTHOR]-(user)
(post)-[:LIKED_BY]->(otherusers)
(post)-[:COMMENTED_BY]->(otherusers)
(post)-[:HAS_PHOTO]->(photos)
Nice thing about Neo4j (let's say while getting posts in a user profile) is that you can grab all this at once (also profile picture and user details while you're at it and if you have already liked the post). That is a lot of detail in one query (cypher command). Now if we want to give the same level of detail to Elasticsearch output I can think of the following at the moment:
Store everything in Neo4j and Elasticsearch. When a text is searched list the results from elasticsearch itself. But there are still problems like getting if the user has already liked the post (this might need querying neo again for each post? this sounds not so nice)
Store post id in Elasticsearch. When populating search results, grab information of each post from neo4j database with this post id and display the results. (10 results -> 10 separate calls, sounds very inefficient again)
Get a list of ids that the Elasticsearch provides and make 1 call to neo4j and grab results (don't know how to do this or if there are performance issues). A cypher reference could be helpful.
Any solutions apart from these? These sound a little inefficient.
Neo4j has some upper bound limit for the graph size and can support tens of billions of nodes, properties, and relationships in a single graph. No security is provided at the data level and there is no data encryption. Security auditing is not available in Neo4j.
Elasticsearch is not a database, and we don't want to use it as a database. It is just a search engine for data from Neo4j in this use case and is a valuable tool for textual search.
Leading telcos like Verizon, Orange, Comcast, and AT&T rely on Neo4j to manage networks, control access, and enable customer 360.
Neo4j has the most popular and active graph database community. Reviews report that their product is easy to learn and easy to use with plenty of resources from training materials to books. Neo4j is well-established with loads of resources for their users.
This is a bit of an opinion-based question because it doesn't have a "right" answer, so prepare for the SO no-fun hammer to come crashing down... but I've been thinking that a one-two (Elasticsearch then Neo) punch is the best way to handle this: index properties in Elasticsearch, perform a full-text search to get possible IDs, then build a Cypher match that limits results to returned IDs.
In Cypher, you can use IN []
to return records that match within an array. So you could do MATCH (u:Student { age: 30 }) WHERE ID(u) IN [1, 2, 3, 4] RETURN u
. The trick of integrating Elasticsearch with Neo, then, is to make it easy to build Cypher queries around ES results. I don't really have tips on doing that cause it will depend on your language and driver.
In Neo4j.rb, I'm thinking about trying to automate this so you can do this:
student.lessons(:l).where(name: 'Chris').to_a
...and it'll know that the Lesson model is using Elasticsearch, do the query, and then change the query for the user so it is effectively this:
student.lessons(:l).where('ID(l) IN {elasticsearch_results}').params(elasticsearch_results: [1, 2, 3, 4]).to_a`
I've been using Searchkick for full-text search with Neo and it's been going well. I think this is doable. Not a solution to your problem but it's how I'm thinking through it, so maybe it'll give you some ideas.
It's worth noting that Neo does have fuzzy search using =~
, it just doesn't use indexes so there may be a performance hit. This might not be a problem, though, since you can filter the number of nodes and properties Cypher has to inspect by adding more information to other parts your query. You should do some benchmarks with your data to figure out if the added overhead of Elasticsearch and more complicated queries are necessary.
I am also looking for Neo4J & Elasticsearch integration. I have seen "Neo4j River Plugin for ElasticSearch". But i don't know how to use it. If you guys will find any information about integrating with "Neo4j River Plugin for ElasticSearch", please let me know, will be really helpful.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With