Suppose I want to store relationships among the users of my application, similar to Facebook, per se. That means if A is a friend(or some relation) of B, then B is also a friend of A. To store this relationships I am currently planning to store them in a table for relations as follows <pre class="prettyprint"><code> UID FriendID ------ -------- user1 user2 user1 user3 user2 user1 </code></pre> However I am facing two options here: <ol> <li>The typical case, where I will store both <code>user1 -> user2</code> and <code>user2->user1</code>. This will take more space, but (at least in my head) require just one pass over the rows to display the friends of a particular user.</li> <li>The other option would be to store either <code>user1->user2</code> OR <code>user2->user1</code> and whenever I want to find all the friends of <code>user1</code>, I will query on both columns of table to find a user's friends. It will take half the space but (again at least in my head) twice the amount of time.</li> </ol> First of all, is my reasoning appropriate? If yes, then are there any bottlenecks that I am forgetting (in terms of scaling / throughput or anything)? Basically, are there any trade-offs between the two, other than the ones listed here. Also, in industry is one preferred over the other?

Here is how these two approaches will be physically represented in the database: <img src="https://i.stack.imgur.com/fgztS.png" alt="enter image description here"> Let us analyze both approaches... Approach 1 (both directions stored in the table): <ul> <li>PRO: Simpler queries.</li> <li>CON: Data can be corrupted by inserting/updating/deleting only one direction.</li> <li>MINOR PRO: Doesn't require additional constraints to ensure a friendship cannot be duplicated.</li> <li>Further analysis needed: <ol> <li>TIE: One index covers both directions, so you don't need a secondary index.</li> <li>TIE: Storage requirements.</li> <li>TIE: Performance.</li> </ol> </li> </ul> Approach 2 (only one direction stored in the table): <ul> <li>CON: More complicated queries.</li> <li>PRO: Can't corrupt the data by forgetting to handle the opposite direction, since there is no opposite direction.</li> <li>MINOR CON: Requires <code>CHECK(UID < FriendID)</code>, so a same friendship can never be represented in two different ways, and the key on <code>(UID, FriendID)</code> can do its job.</li> <li>Further analysis needed: <ol> <li>TIE: Two indexes are necessary to cover both directions of querying (composite index on <code>{UID, FriendID}</code> and composite index on <code>{FriendID, UID}</code>).</li> <li>TIE: Storage requirements.</li> <li>TIE: Performance.</li> </ol> </li> </ul> The point 1 is of special interest. MySQL/InnoDB always clusters data, and secondary indexes can be expensive in clustered tables (see "Disadvantages of clustering" in this article), so it might seem as if the secondary index in approach 2 would eat-up all the advantages of fewer rows. However, the secondary index contains the exact same fields as the primary (only in the opposite order) so there is no storage overhead in this particular case. There is also no pointer to table heap (since there is no table heap), so it's probably even cheaper storage-wise that a normal heap-based index. And assuming the query is covered with the index, there won't be a double-lookup normally associated with a secondary index in a clustered table either. So, this is basically a tie (neither approach 1 nor approach 2 has significant advantage). The point 2 is related to the point 1: it doesn't matter whether we will have a B-Tree of N values or two B-Trees, each with N/2 values. So this is also a tie: both approaches will use-up approximately same amount of storage. The same reasoning applies to point 3: whether we search one larger B-Tree or 2 smaller ones, doesn't make much of a difference, so this is also a tie. So, for the robustness, and despite somewhat uglier queries and a need for additional <code>CHECK</code>, I'd go with the approach 2.

How to store bidirectional relationships in a RDBMS like MySQL?

Tags:

mysql

relational-database

database-design

relationship

Suppose I want to store relationships among the users of my application, similar to Facebook, per se.

That means if A is a friend(or some relation) of B, then B is also a friend of A. To store this relationships I am currently planning to store them in a table for relations as follows

  UID      FriendID
 ------    --------
 user1      user2
 user1      user3
 user2      user1

However I am facing two options here:

The typical case, where I will store both user1 -> user2 and user2->user1. This will take more space, but (at least in my head) require just one pass over the rows to display the friends of a particular user.
The other option would be to store either user1->user2 OR user2->user1 and whenever I want to find all the friends of user1, I will query on both columns of table to find a user's friends. It will take half the space but (again at least in my head) twice the amount of time.

First of all, is my reasoning appropriate? If yes, then are there any bottlenecks that I am forgetting (in terms of scaling / throughput or anything)?

Basically, are there any trade-offs between the two, other than the ones listed here. Also, in industry is one preferred over the other?

888

asked May 29 '12 22:05

Ankit

1 Answers

Here is how these two approaches will be physically represented in the database:

enter image description here

Let us analyze both approaches...

Approach 1 (both directions stored in the table):

PRO: Simpler queries.
CON: Data can be corrupted by inserting/updating/deleting only one direction.
MINOR PRO: Doesn't require additional constraints to ensure a friendship cannot be duplicated.
Further analysis needed:
1. TIE: One index covers both directions, so you don't need a secondary index.
2. TIE: Storage requirements.
3. TIE: Performance.

Approach 2 (only one direction stored in the table):

CON: More complicated queries.
PRO: Can't corrupt the data by forgetting to handle the opposite direction, since there is no opposite direction.
MINOR CON: Requires CHECK(UID < FriendID), so a same friendship can never be represented in two different ways, and the key on (UID, FriendID) can do its job.
Further analysis needed:
1. TIE: Two indexes are necessary to cover both directions of querying (composite index on {UID, FriendID} and composite index on {FriendID, UID}).
2. TIE: Storage requirements.
3. TIE: Performance.

The point 1 is of special interest. MySQL/InnoDB always clusters data, and secondary indexes can be expensive in clustered tables (see "Disadvantages of clustering" in this article), so it might seem as if the secondary index in approach 2 would eat-up all the advantages of fewer rows. However, the secondary index contains the exact same fields as the primary (only in the opposite order) so there is no storage overhead in this particular case. There is also no pointer to table heap (since there is no table heap), so it's probably even cheaper storage-wise that a normal heap-based index. And assuming the query is covered with the index, there won't be a double-lookup normally associated with a secondary index in a clustered table either. So, this is basically a tie (neither approach 1 nor approach 2 has significant advantage).

The point 2 is related to the point 1: it doesn't matter whether we will have a B-Tree of N values or two B-Trees, each with N/2 values. So this is also a tie: both approaches will use-up approximately same amount of storage.

The same reasoning applies to point 3: whether we search one larger B-Tree or 2 smaller ones, doesn't make much of a difference, so this is also a tie.

So, for the robustness, and despite somewhat uglier queries and a need for additional CHECK, I'd go with the approach 2.

164

answered Nov 11 '22 16:11

Branko Dimitrijevic

Related questions
                            
                                One-to-Many relationship in MySQL - how to build model?
                            
                                Automatically updating date column on edit in phpMyAdmin
                            
                                Elastic search full text vs mysql full text?
                            
                                How to change sql_mode at runtime
                            
                                How to update a field based on its current value in MySQL?
                            
                                How can I pass an array of PDO parameters yet still specify their types?
                            
                                MySQL insert to DATETIME: is it safe to use ISO::8601 format?
                            
                                ER_CON_COUNT_ERROR: Too many connections error in node-mysql
                            
                                Scan error: unsupported Scan, storing driver.Value type <nil> into type *string
                            
                                PHP Web Application: mysql database design best practices question
                            
                                Does replace into have a where clause?
                            
                                How to transform vertical data into horizontal data with SQL?
                            
                                mysql separating tables
                            
                                MYSQL CASE THEN statement with multiple values
                            
                                Designing an E-Commerce Database - MySQL
                            
                                MySQL error #1054 - Unknown column in 'Field List'
                            
                                How do you set "max_allowed_packet" in XAMPP? [closed]
                            
                                normalizing accented characters in MySQL queries
                            
                                What is the best search Algorithm for PHP & MYSQL? [closed]
                            
                                Is it odd that my SQLAlchemy MySQL connection always ends up sleeping?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With