Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to handle massive storage of records in database for user authorization purposes?

I am using Ruby on Rails 3.2.2 and MySQL. I would like to know if it is "advisable" / "desirable" to store in a database table related to a class all records related to two others classes for each "combination" of their instances.

That is, I have User and Article models. In order to store all user-article authorization objects, I would like to implement a ArticleUserAuthorization model so that given N users and M articles there are N*M ArticleUserAuthorization records.

Making so, I can state and use ActiveRecord::Associations as the following:

class Article < ActiveRecord::Base
  has_many :user_authorizations, :class_name => 'ArticleUserAuthorization'
  has_many :users, :through => :user_authorizations
end

class User < ActiveRecord::Base
  has_many :article_authorizations, :class_name => 'ArticleUserAuthorization'
  has_many :articles, :through => :article_authorizations
end

However, the above approach of storing all combinations will result in a big database table containing billions billions billions of rows!!! Furthermore, ideally speaking, I am planning to create all authorization records when an User or an Article object is created (that is, I am planning to create all previously mentioned "combinations" at once or, better, in "delayed" batches... in any way, this process creates other billions billions of database table rows!!!) and make the viceversa when destroying (by deleting billions billions of database table rows!!!). Furthermore, I am planning to read and update those rows at once when an User or Article object is updated.

So, my doubts are:

  • Is this approach "advisable" / "desirable"? For example, what kind of performance problems may occur? or, is a bad "way" / "prescription" to admin / manage databases with very large database tables?
  • How may / could / should I proceed in my case (maybe, by "re-thinking" at all how to handle user authorizations in a better way)?

Note: I would use this approach because, in order to retrieve only "authorized objects" when retrieving User or Article objects, I think I need "atomic" user authorization rules (that is, one user authorization record for each user and article object) since the system is not based on user groups like "admin", "registered" and so on. So, I thought that the availability of a ArticleUserAuthorization table avoids to run methods related to user authorizations (note: those methods involve some MySQL querying that could worsen performance - see this my previous question for a sample "authorization" method implementation) on each retrieved object by "simply" accessing / joining the ArticleUserAuthorization table so to retrieve only "user authorized" objects.

like image 207
Backo Avatar asked Jun 20 '12 06:06

Backo


1 Answers

The fact of the matter is that if you want article-level permissions per user then you need a way to relate Users to the Articles they can access. This neccesitates a minimum you need N*A (where A is the number of uniquely permissioned articles).

The 3NF approach to this would be, as you suggested, to have a UsersArticles set... which would be a very large table (as you noted).

Consider that this table would be accessed a whole lot... This seems to me like one of the situations in which a slightly denormalized approach (or even noSQL) is more appropriate.

Consider the model that Twitter uses for their user follower tables:

Jeff Atwood on the subject

And High Scalability Blog

A sample from those pieces is a lesson learned at Twitter that querying followers from a normalized table puts tremendous stress on a Users table. Their solution was to denormalize followers so that a user's follower's are stored on their individual user settings.

Denormalize a lot. Single handedly saved them. For example, they store all a user IDs friend IDs together, which prevented a lot of costly joins. - Avoid complex joins. - Avoid scanning large sets of data.

I imagine a similar approach could be used to serve article permissions and avoid a tremendously stressed UsersArticles single table.

like image 67
Matthew Avatar answered Sep 19 '22 00:09

Matthew