I'm implementing a service where each user must have his own json/document database. Beyond letting the user to query json documents by example, the database must also support ACID transactions involving multiple documents, so I have discarded using Couch/Mongo or other NoSQL databases(can't use RavenDB since it must run on Unix systems).
With that in mind I've been trying to figure a way to implement that on top of a SQL database. Here's what I have came up with so far:
CREATE TABLE documents (
id INTEGER PRIMARY KEY,
doc TEXT
);
CREATE TABLE indexes (
id INTEGER PRIMARY KEY,
property TEXT,
value TEXT,
document_id INTEGER
)
Each user would have a database with these two tables, and the user would have to declare which fields he needed to query so the system could properly populate the 'Indexes' table. So if user 'A' configures his account to enable queries by 'name' and 'age', everytime that user inserts a document that has a 'name' or 'age' property the system would also insert a record to the 'indexes' table, where the 'property' column would contain name/age , 'value' would contain the property value and 'document_id' would point to the corresponding document.
For example, let's say the user inserts the following doc:
'{"name" : "Foo", "age" 43}'
This would result in a insert to the 'documents' table and two more inserts to the 'indexes' table:
INSERT INTO documents (id,doc) VALUES (1, '{"name" : "Foo", "age" 43}');
INSERT INTO indexes (property, value, document_id) VALUES ('name', 'foo', 1);
INSERT INTO indexes (property, value, document_id) VALUES ('age', '43', 1);
Then let's say that user 'A' sent the service the following query:
'{"name": "Foo", "age": 43}' //(the queries are also json documents).
This query would be translated to the following SQL:
SELECT doc FROM documents
WHERE id IN (SELECT document_id FROM indexes
WHERE document_id IN (SELECT document_id FROM indexes
WHERE property = 'name' AND value = 'Foo')
AND property = 'age' AND value = '43')
My questions:
Your indexes
table is a what is known as Entity-Attribute-Value
.
EAV tables are fine for storing information and recalling it when you know the entity. (In your case, finding all the indexes
rows when you know the document_id
.)
But they are terrible the other way around: Supplying Attribute-Value combinations to search for an Entity. Which is exactly what you have in your final query. As more and more entities share the same attribute-value combinations (such as name=foo
) the query performance degrades.
So, to answer your first two questions:
1. The query, as written, requires n
sub-queries when searching for n
properties. This will scale very poorly as n
grows.
2. As the number of records grows it will degrade, especially with millions/billions records.
In general, if you read about EAV
, people strongly recommend shying away from it.
And, worse still, there isn't really a good alternative within SQL. The standard way to optimise a search is with an index, which can easily be modelled as a sorted data-set. But you would then need many indexes:
- An index on (fieldX, fieldY, fieldZ)
is great if you search on all three columns.
- But it sucks if you have to search on just fieldZ
.
If you can re-model this with a traditional table, with a fixed number of columns, and have the space to apply every index combination you would ever need, that would be you most performant model.
If you can't fix the number of columns (new properties
coming along all the time) and/or you don't have space for all the different combinations of index, you seem to be stuck with EAV. Which will work, but it will not scale very well in terms of 'instantaneous' results.
NOTE: If you do stick with EAV, have you tested this query structure?
SELECT
document_id
FROM
indexes
WHERE
(property = 'name' AND value = 'Foo')
OR (property = 'age' AND value = '43' )
GROUP BY
document_id
HAVING
COUNT(*) = 2
This assumes that (document_id, property, value)
is unique. Otherwise one document could have ('name', 'foo')
twice, and so pass the COUNT(*)
clause.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With