I have a web application that accesses large amounts of JSON data.
I want to use a key value database for storing JSON data owned/shared by different users of the web application (not users of the database). Each user should only be able to access the records they own or share.
In a relational database, I would add a column Owner to the record table, or manage shared ownerships in a separate table, and check access on the application side (Python). For key value stores, two approaches come to mind.
What if I use keys like USERID_RECORDID and then write code to check the USERID before accessing the record? Is that a good idea? It wouldn't work with records that are shared between users.
I could store one or more USERIDs in the value data and check if the data contains the ID of the user trying to access the record. Performance is probably slower than having the user ID as part of the key, but shared ownerships are possible.
Both of the solutions you described have some limitations.
You point yourself that including the owner ID in the key does not solve the problem of shared data. However, this solution may be acceptable, if you add another key/value pair, containing the IDs of the contents shared with this user (key: userId:shared, value: [id1, id2, id3...]).
Your second proposal, in which you include the list of users who were granted access to a given content, is OK if and only if you application needs to make a query to retrieve the list of users who have access to a particular content. If your need is to list all contents a given user can access, this design will lead you to poor performances, as the K/V store will have to scan all records -and this type of database engine usually don't allow you to create an index to optimise this kind of request.
From a more general point of view, with NoSQL databases and especially Key/Value stores, the model has to be defined according to the requests to be made by the application. It may lead you to duplicate some information. The application has the responsibility of maintaining the consistency of the data.
By example, if you need to get all contents for a given user, whether this user is the owner of the content or these contents were shared with him, I suggest you to create a key for the user, containing the list of content Ids for that user, as I already said. But if your app also needs to get the list of users allowed to access a given content, you should add their IDs in a field of this content. This would result in something like :
key: contentID, value: { ..., [userId1, userID2...]}
When you remove the access to a given content for a user, your app (and not the datastore) have to remove the userId from the content value, and the contentId from the list of contents for this user.
This design may imply for your app to make multiple requests: by example one to get the list of userIDs allowed to access a given content, and one or more to get these user profiles. However, this should not really be a problem as K/V stores usually have very high performances.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With