Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Database Design: track a vast number of attributes for each user. So much so, that I will likely run out of columns (row storage space)

I'd appreciate some opinions on a concern I have.

I have a [User] table in my database, with the basic stuff you'd expect, like username, password, etc...

This application requires that I track a vast number of attributes for each user. So much so, that I will likely run out of columns (row storage space).

I'm tempted to add a UserProperties table with UserID, PropertyKey and PropertyValue columns. This approach fits well with the requirements.

My concern is that if each user has say 100 properties, when the database has a million users in it, we'll have 100,000,000 property rows.

I would think that with a clustered index on the UserID, that access will still be screaming fast, and you are really storing about the same amount of data as you would with the mega-columns approach.

Any ideas or thoughts on performance concerns? Ideas for a better DB design?

UPDATE:

I have been toying around with the possibilities, and one thing keeps bothering me. I need to query on some of these attributes pretty frequently, and worse yet, these queries could involve finding all users who match criteria on as many as 10 of these attributes at the same time.

As a result, I am now leaning towards the mega-column approach, but possibly splitting the data off into one (or more) separate tables, forming a one-to-one relationship keyed on the UserID.

I'm using LinqToSql, and while I think tables with this many columns are inelegant, I think considering all the challenges and trade-offs, it is probably the right one, but I am still eager to hear other opinions.

like image 401
Michael Avatar asked Apr 05 '09 02:04

Michael


2 Answers

What you're describing is an Entity-Attribute-Value database, which is often used for exactly th situation you describe, sparse data tied to a single entity.

An E-A-V table is easy to search. The problem isn't finding rows, it's finding related rows.

Having different tables for different entities provides domain modeling, but they also provide a weak form of metadata. In E-A-V there are no such abstractions. (The Java analogy to E-A-V would be declaring that all functions' formal arguments were of type Object -- so you'd get no type-checking.)

We can easily look up the property keys, but nothing groups these property keys.

Wikipedia has a very good article on E-A-V, but read it now -- it's mostly the work of one author, and is slated for "improvement".

like image 91
tpdi Avatar answered Oct 03 '22 01:10

tpdi


I recommend that you consider the approach known as vertical partitioning. This means that you keep defining tables with a UserID key, you could call them User1, User2, etc. Start a new table when you hit the maximum row size for your database. The benefit of this approach is that the values are still true database attributes. This will wind up saving time when working with this data, e.g. data binding.

The key question to answer is: are these really attributes? Do they represent the struture of information that you must collect about the user. If so, the best way to model them is to make them columns. The only reason you must resort to vertical partitioning is the row size limit of the database.

If, on the other hand, a flexible attribute system is called for, then by all means go with the property key/property value system. For example, if users were allowed to define their own attributes dynamically, then you'd definitely want the key/value system. However, I would say key/value is not the best way if you understand the structure of your data and have legitimately identified hundreds of attributes for users.

As a side note, I must say that you should question entities with large numbers of attributes. They may be valid, but it's also quite likely that you're missing some entities at the conceptual level. In other words, mabe all of these attributes don't related to the user per se, but to some other entity that is related to users.

like image 43
Paul Keister Avatar answered Oct 02 '22 23:10

Paul Keister