Suppose I want to store relationships among the users of my application, similar to Facebook, per se.
That means if A is a friend(or some relation) of B, then B is also a friend of A. To store this relationships I am currently planning to store them in a table for relations as follows
UID FriendID
------ --------
user1 user2
user1 user3
user2 user1
However I am facing two options here:
user1 -> user2
and user2->user1
. This will take more space, but (at least in my head) require just one pass over the rows to display the friends of a particular user.user1->user2
OR user2->user1
and whenever I want to find all the friends of user1
, I will query on both columns of table to find a user's friends. It will take half the space but (again at least in my head) twice the amount of time.First of all, is my reasoning appropriate? If yes, then are there any bottlenecks that I am forgetting (in terms of scaling / throughput or anything)?
Basically, are there any trade-offs between the two, other than the ones listed here. Also, in industry is one preferred over the other?
A bidirectional relationship has both an owning side and an inverse side. A unidirectional relationship has only an owning side. The owning side of a relationship determines how the Persistence runtime makes updates to the relationship in the database.
A bidirectional relationship is valid in both directions. Intersects is an example of a bidirectional relationship.
Data bus is used to transfer data from one unit to another unit of the computer system. Microprocessor can read data from the memory or write data to the memory. So, the data bus is bidirectional. Was this answer helpful?
Here is how these two approaches will be physically represented in the database:
Let us analyze both approaches...
Approach 1 (both directions stored in the table):
Approach 2 (only one direction stored in the table):
CHECK(UID < FriendID)
, so a same friendship can never be represented in two different ways, and the key on (UID, FriendID)
can do its job.{UID, FriendID}
and composite index on {FriendID, UID}
).The point 1 is of special interest. MySQL/InnoDB always clusters data, and secondary indexes can be expensive in clustered tables (see "Disadvantages of clustering" in this article), so it might seem as if the secondary index in approach 2 would eat-up all the advantages of fewer rows. However, the secondary index contains the exact same fields as the primary (only in the opposite order) so there is no storage overhead in this particular case. There is also no pointer to table heap (since there is no table heap), so it's probably even cheaper storage-wise that a normal heap-based index. And assuming the query is covered with the index, there won't be a double-lookup normally associated with a secondary index in a clustered table either. So, this is basically a tie (neither approach 1 nor approach 2 has significant advantage).
The point 2 is related to the point 1: it doesn't matter whether we will have a B-Tree of N values or two B-Trees, each with N/2 values. So this is also a tie: both approaches will use-up approximately same amount of storage.
The same reasoning applies to point 3: whether we search one larger B-Tree or 2 smaller ones, doesn't make much of a difference, so this is also a tie.
So, for the robustness, and despite somewhat uglier queries and a need for additional CHECK
, I'd go with the approach 2.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With