I am trying to achieve the following:
Select all the records that I own, where ownership is objects I have created OR objects a user I manage has created, where user management can be in a hierarchy of users managing users
Ownership is clearly straight forward and could be handled by a simple id corresponding to the owner. The hierarchy of user management has me a little stumped to perform without heavy lifting through large lists of IDs (you can obviously just find every user that is managed and list every object created by any of those users using an IN clause or similar).
Ideally this all happens in a single query so normal paging and conditions can occur.
I was thinking that there might have been some maths to get it done - having IDs that can somehow be hashed to determine if they are owned by anyone in the chain of command.
Any references for this sort of thing?
Am I missing something obvious?
Using MongoDB if that makes a difference, but happy to think about other databases for inspiration.
UPDATE: have created a MongoDB collection with 1,000,000 records to get some solid data on exactly what constitutes a manageable number of parameters for an IN clause on a query. Will report back when I have some concrete information.
ANALYSIS:
Using ruby-mongo-driver and the ruby benchmark lib.
MongoDB Collection with 1039944 records
Records are defined as:
{
first_name: String,
last_name: String,
email: String,
phone: String,
company: String,
owner: BSON::ObjectId
}
With randomly generated values for all fields.
The Owner field has an index.
Running queries with the following conditions:
conditions = {"owner" => { "$in" => id_list }}
opts = {skip: rand, limit: 100}
Results:
# 10201 ids
# user system total real
# 0: 0.240000 0.000000 0.240000 ( 0.265148)
# 1: 0.240000 0.010000 0.250000 ( 0.265757)
# 2: 0.240000 0.000000 0.240000 ( 0.267149)
# 3: 0.240000 0.000000 0.240000 ( 0.269981)
# 4: 0.240000 0.000000 0.240000 ( 0.270436)
# Find: 0.240000 0.000000 0.240000 ( 0.266709)
# 5201 ids
# user system total real
# 0: 0.120000 0.000000 0.120000 ( 0.133824)
# 1: 0.120000 0.000000 0.120000 ( 0.134787)
# 2: 0.110000 0.000000 0.110000 ( 0.133262)
# 3: 0.110000 0.000000 0.110000 ( 0.136046)
# 4: 0.120000 0.000000 0.120000 ( 0.141220)
# Find: 0.130000 0.000000 0.130000 ( 0.139110)
# 201 ids
# user system total real
# 0: 0.010000 0.000000 0.010000 ( 0.006044)
# 1: 0.000000 0.000000 0.000000 ( 0.004681)
# 2: 0.010000 0.000000 0.010000 ( 0.004578)
# 3: 0.000000 0.000000 0.000000 ( 0.007048)
# 4: 0.010000 0.000000 0.010000 ( 0.008487)
# Find: 0.000000 0.000000 0.000000 ( 0.005990)
# 1 id (NOT using IN)
# user system total real
# 0: 0.000000 0.000000 0.000000 ( 0.002868)
# 1: 0.000000 0.000000 0.000000 ( 0.004937)
# 2: 0.010000 0.000000 0.010000 ( 0.003151)
# 3: 0.000000 0.000000 0.000000 ( 0.002983)
# 4: 0.000000 0.000000 0.000000 ( 0.003313)
# Find: 0.000000 0.000000 0.000000 ( 0.002742)
Even with a list of 10k ids in the query, performance is pretty snappy.
If you are trying to "select" records from MongoDB based on a "column" having a value from a set of possible values that you'd need a join against a user management table to determine, then NoSQL is working against you...
If the list of user IDs is still manageable you can do a where ownerId in (?,?,?,?,?...)
type of query (after having first determined the list):
db.documents.find({owner:{$in: [1234, 2345, 4444, 77777, 99999]}})
The NoSQL way is probably to denormalize things, for example by including not just the ownerId in the document, but the complete path up the management hierarchy:
{ _id: 'the document A',
owner : 1234,
managers: [ 2345, 4444, 77777, 99999 ]
}
Of course, that will need to be updated when the user hierarchy gets shifted around.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With