I have worked with various applications and encountered this situation many times. Until now I have not figured out what is the best approach.
Here's the scenario:
GeneralDetails
table:
| DocumentID | DateCreated | Owner |
| 1 | 07/07/07 | Naruto |
| 2 | 08/08/08 | Goku |
| 3 | 09/09/09 | Taguro |
ItemDetails
table
| DocumentID | Item | Quantity |
| 1 | Marbles | 20 |
| 1 | Cards | 56 |
| 2 | Yo-yo | 1 |
| 2 | Chess board | 3 |
| 2 | GI Joe | 12 |
| 3 | Rubber Duck | 1 |
As you can see, the tables have a one-to-many relationship. Now, in order to retrieve all the documents and their respective items, I always do either of the two:
Method 1 - Many round trips (pseudo-code):
Documents = GetFromDB("select DocumentID, Owner " +
"from GeneralDetails")
For Each Document in Documents
{
Display(Document["CreatedBy"])
DocumentItems = GetFromDB("select Item, Quantity " +
"from ItemDetails " +
"where DocumentID = " + Document["DocumentID"] + "")
For Each DocumentItem in DocumentItems
{
Display(DocumentItem["Item"] + " " + DocumentItem["Quantity"])
}
}
Method 2 - Much irrelevant data (pseudo-code):
DocumentsAndItems = GetFromDB("select g.DocumentID, g.Owner, i.Item, i.Quantity " +
"from GeneralDetails as g " +
"inner join ItemDetails as i " +
"on g.DocumentID = i.DocumentID")
//Display...
I used the first method when I was in college for desktop applications, the performance was not bad so I realized it was okay.
Until one day, I saw an article "Make the web faster", it says that many round trips to the database is bad; so ever since then I have used the second method.
On the second method, I avoided round trips by using inner join to retrieve the first and the second table at once, but it produces unecessary or redundant data. See the result set.
| DocumentID | Owner | Item | Quantity |
| 1 | Naruto | Marbles | 20 |
| 1 | Naruto | Cards | 56 |
| 2 | Goku | Yo-yo | 1 |
| 2 | Goku | Chess board | 3 |
| 2 | Goku | GI Joe | 12 |
| 3 | Taguro | Rubber Duck | 1 |
The result set has redundant DocumentID
and Owner
. It looks like an unnormalized database.
Now, the question is, how do I avoid round trips and at the same time avoid redundant data?
The inner join is better because the database has more possibilities to optimize.
In general you can not create a query like this which does not produce redundant results. For that, the relational model is too restrictive. I would just live with it: the database is responsible to optimize these cases.
If you really encounter performance problems (mainly because of a network bottleneck) you could write a stored procedure, that makes the query and denormalizes it. In your example you create create a result like:
| DocumentID | Owner | Items | Quantity |
| 1 | Naruto | Marbles, Cards | 20, 56 |
| 2 | Goku | Yo-yo, Chess board, GI Joe, Rubber Duck | 1, 3, 12, 1 |
But this of course does not conform to the first normal form - so you will need to parse it on the client. If you use a database with XML support (like Oracle or MS SQL Server) you could even create an XML file on the server and send this to the client.
But whatever you do, remember: premature optimization is the root of all evil. Don't do this kind of stuff before you are not 100% sure, that you are really facing a problem you can solve like this.
The method used by ActiveRecord and other ORMs is to select the first table, batch together the IDs and then use those IDs in an IN clause for the second select.
SELECT * FROM ItemDetails WHERE DocumentId IN ( [Comma Separated List of IDs here] )
Advantages:
Disadvantages:
Generally speaking, the first method is referred to as the "N+1 query problem" and the solutions are referred to as "eager loading". I tend to see your "Method 2" as preferable as the latency to the database generally trumps the size of the redundant data over the data transfer rate, but YRMV. As with almost everything in software, it's a tradeoff.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With