I have a system (RESTful) using NodeJS and Elasticsearch which implements RBAC authorization policy. The RBAC authorization works with an authorization server in front of other APIs testing each request against the routes authorized to user's roles (using bearer token to authenticate the user).
I like this design because other API's doesn't need to know about the authorization/authentication service. And it's very, very, very fast, because it uses an in-memory cache policy instead making request to Elasticsearch every time that receives a new request to test the auth.
But now I need to implement ACL to provide more granular control of authorization. From REST point of view the policy will be applied at resources level. Example: "POST:/user/123" is authorized only to A user.
I've done a survey with the clients and 85% will only use allow policy of ACL's, by default the ACL control will deny everything. So ok, now I have all information to develop this control. But I don't see the best way to implement this.
My first thought was:
The most important quality of system is to be scalable;
Okay, it's impossible to do in memory cache, I've done some simulations with 100k users and 1 million of resources (which can be a real scenario) and the amount of memory is HUGE, this feature will have a high cost if cached;
In this case the authentication service can't handle ACL because it can't filter searchs. The auth service doesn't intercept results, only validate headers and routes against roles;
So, with all this points, what if in each document at Elasticsearch I had a new field named "acl_allow_method_user" which is an array of method + user's ID's authorized to use this resource? Will end up with something like this:
"acl_allow_method_user":["POST:123434"]
I'll also have to create a common package to be used by all API's to validate this policy on each interaction with Elasticsearch, but I don't see any problem with this.
Anyone with experience on ACL, is this a good design?
Elasticsearch have limit to size of array fields?
What about performance? Will have impact with this approach?
I would suggest having a separate Elasticsearch index for the ACLs, which should be much smaller than your main document index. This will allow you to tune the ACL index settings appropriately, e.g. (1) with a number of shards lower than your main document index, (2) auto_expand_replicas set to 0-all in case you'd like to use terms query (example: load all documents owned by a user), and (3) enforce different retention/GDPR policies.
The ACL index can then contain a document for each ACL rule, e.g. userId=1,docId=123,opType=POST. Note that this approach will allow you to define ACL rules for other types of principals and resources in the future. Moreover, this can support ACLs that can match new documents dynamically, e.g. userId=1,opType=POST,pattern="*" will allow user with userId=1 to post any document, effectively being a sysadmin. Decoupling ACLs from the documents/users will allow you to update ACLs without having to update corresponding documents, which will perform better in Elasticsearch which doesn't do an in-place update and instead deletes and re-creates the document. Moreover, you'd be able to replace (PUT) the entire document without worrying about preserving the associated ACLs. However, you may want to clean up ACLs when documents or users are deleted, which can be done during the deletion or as a separate scheduled cleanup process.
Now that the ACLs are separate from the documents themselves, they can be cached in memcached or Redis cluster without requiring too much memory. In a typical OLTP system only a small subset of users is active at any point in time, so you can configure your LRU cache appropriately to increase the hit rate. It's hard to provide further recommendations without knowing what kind of access patterns are characteristic of your system.
One last point to consider is what generates the ACLs. If some ACLs are generated automatically, e.g. based on some pattern, then maybe you could use this pattern in your system to avoid having an ACL rule per user per document. For example, if some ACLs are generated from directory service, then you might be able to cache (and periodically refresh) LDAP rules in your ACL management system.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With