This is more actually more of a Lucene question, but it's in the context of a neo4j database.
I have a database that's divided into 50 or so node types (so "collections" or "tables" in other types of dbs). Each has a subset of properties that need to be indexed, some share the same name, some don't.
When searching, I always want to find nodes of a specific type, never across all nodes.
I can see three ways of organizing this:
One index per type, properties map naturally to index fields: index 'foo', 'id'='1234'
.
A single global index, each field maps to a property name, to distinguish the type either include it as part of the value ('id'='foo:1234'
) or check the nodes once they're returned (I expect duplicates to be very rare).
A single index, type is part of the field name: 'foo.id'='1234'
.
Once created, the database is read-only.
Are there any benefits to one of those, in terms of convenience, size/cache efficiency, or performance?
As I understand it, for the first option neo4j will create a separate physical index for each type, which seems suboptimal. For the third, I end up with most lucene docs only having a small subset of the fields, not sure if that affects anything.
I came across this problem recently when I was building an ActiveRecord
connection adapter for Neo4j over REST, to be used in a Rails project. Since ActiveRecord
and ActiveRelation
, both, have a tight coupling with SQL syntaxes, it became difficult to fit everything into NoSQL. Might not be the best solution, but here's how I solved it:
model_index
which indexed nodes under two keys, type
and model
type
key currently happens with just one value model
. This was introduced primarily to achieve a SHOW TABLES
SQL functionality which can get me a list of all models present in the graph.model
key takes place with values corresponding to different model names in my system. This is primarily for achieving DESC <TABLENAME>
functionality.CREATE TABLE
, a node is created with table definition attributes being stored in node properties.model_index
with type:model
and model:<model-name>
. This enables the newly created model in the list of 'tables' and also allows one to directly reach the model node by an index lookup with model
key.model
(type in your case), an outgoing edge is created labeled instances
directed from model node to this new record. v[123] :=> [instances] :=> v[245]
where v[123] represents model node and v[245] represents a record of v[123]'s type.model_index
with model:<model-name>
to reach a model node and then fetch all adjacent nodes over an outgoing edge labeled instances
. Filtered lookups can be further achieved by applying filters and other complex traversals.The above solution prevents model_index from clogging since it contains 2x and achieves an effective record lookup via one index lookup and single-level traversal.
Although in your case, nodes of different types are not adjacent to each other, even if you wanted to do so, you could determine the type of any arbitrary node by simply looking up it's adjacent node with an incoming edge labeled instances
. Further, I'm considering the incorporate SpringDataGraph's pattern of storing a __type__
property on each instance node to avoid this adjacent node lookup.
I'm currently translating AREL to Gremlin scripts for almost everything. You could find the source code for my AR Adapter at https://github.com/yournextleap/activerecord-neo4j-adapter
Hope this helps, Cheers! :)
A single index will be smaller than several little indexes, because some data, such as the term dictionary, will be shared. However, since a term dictionary lookup is a O(lg(n)) operation, a lookup in a bigger term dictionary might be a little slower. (If you have 50 indexes, this would only require 6 (2^6>=50) more comparisons, it is likely you won't notice any difference.)
Another advantage of a smaller index is that the OS cache is likely to make queries run faster.
Instead of your options 2 and 3, I would index two different fields id
and type
and search for (id
:ID AND type
:TYPE) but I don't know if it is possible with neo4j.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With