Following join is supposed to retrieve user info along with their messages for users with a certain status: <pre class="prettyprint"><code>SELECT * FROM user, message WHERE message.user_id=user.id AND user.status=1 </code></pre> The problem is that all rows about a certain user in the result set contain redundant columns that repeat the same data about that user (those fields retrieved from user table), only fields from the message table contain non-redundant information. Something like this: <pre class="prettyprint"><code>user.id username email message.id subject 1 jane jane@gmail.com 120 Notification 1 jane jane@gmail.com 122 Re:Hello 1 jane jane@gmail.com 125 Quotation 2 john john@yahoo.com 127 Hi jane 2 john john@yahoo.com 128 Fix thiss 2 john john@yahoo.com 129 Ok 3 jim jim@msn.com 140 Re:Re:Quotation </code></pre> As you can see many data are redundant and we do not want to first find the users and then go about their messages in loop like structures or something like that. Loops that cause micro-queries should be avoided at all costs. I am not concerned about the output of my program, that is well handled in the UI. I think perhaps the network traffic produced by returning the result of this query could be substantially reduced if somehow I can manage to eliminate the repetition of user data in all rows pertaining to that user.

There are several things you should know. The first is that the default SQL JOIN construct is essentially a set cross product, restricted by the WHERE clause. This means it's multiplicative - you get duplicate results out which you then prune down. You also have to be careful in the presence of NULL fields. The second is that there is a 'DISTINCT' keyword. When you prefix a column in the selection with this, you'll get at most one instance of a certain value for that column in the results. So, as per your query, 'SELECT DISTINCT user.id FROM' will eliminate the redundancies on the server side. The third is that the correct way to solve this problem is likely not using the <code>*</code> operator. I suggest: <pre class="prettyprint"><code>SELECT user.id,username,email,subject FROM message m,user WHERE m.user_id=user.id AND user.status=1 </code></pre> This uses the simple, easy-to-understand implicit-join syntax and should be valid SQL on whatever server. I can vouch for it working with MySQL, at least. It also aliases the 'message' table to 'm' as shorthand. As you surmise, this will reduce the traffic from the SQL server to your database. edit: if you want to eliminate the "redundant" email information, you can't - you must make two distinct queries. SQL results are tables and must be rectangular, with all known values filled. There's no 'ditto' entry. edit 2: You only have to make two queries. For instance: <pre class="prettyprint"><code>SELECT subject FROM message WHERE message.id IN (SELECT user.id FROM user WHERE status=1) </code></pre> This is one query that contains a nested query, so it's really making two database hits. But it doesn't have any programmatic loops.

How do I avoid redundant data fields in the result set when using JOINs?

Tags:

sql

join

redundancy

Following join is supposed to retrieve user info along with their messages for users with a certain status:

SELECT * FROM user, message WHERE message.user_id=user.id AND user.status=1

The problem is that all rows about a certain user in the result set contain redundant columns that repeat the same data about that user (those fields retrieved from user table), only fields from the message table contain non-redundant information. Something like this:

user.id  username email            message.id  subject
1        jane     [email protected]   120         Notification 
1        jane     [email protected]   122         Re:Hello 
1        jane     [email protected]   125         Quotation
2        john     [email protected]   127         Hi jane
2        john     [email protected]   128         Fix thiss 
2        john     [email protected]   129         Ok
3        jim      [email protected]      140         Re:Re:Quotation

As you can see many data are redundant and we do not want to first find the users and then go about their messages in loop like structures or something like that. Loops that cause micro-queries should be avoided at all costs.

I am not concerned about the output of my program, that is well handled in the UI. I think perhaps the network traffic produced by returning the result of this query could be substantially reduced if somehow I can manage to eliminate the repetition of user data in all rows pertaining to that user.

768

asked Jul 05 '10 04:07

Ashkan Kh. Nazary

1 Answers

There are several things you should know.

The first is that the default SQL JOIN construct is essentially a set cross product, restricted by the WHERE clause. This means it's multiplicative - you get duplicate results out which you then prune down. You also have to be careful in the presence of NULL fields.

The second is that there is a 'DISTINCT' keyword. When you prefix a column in the selection with this, you'll get at most one instance of a certain value for that column in the results. So, as per your query, 'SELECT DISTINCT user.id FROM' will eliminate the redundancies on the server side.

The third is that the correct way to solve this problem is likely not using the * operator. I suggest:

SELECT user.id,username,email,subject FROM message m,user WHERE m.user_id=user.id AND user.status=1

This uses the simple, easy-to-understand implicit-join syntax and should be valid SQL on whatever server. I can vouch for it working with MySQL, at least. It also aliases the 'message' table to 'm' as shorthand.

As you surmise, this will reduce the traffic from the SQL server to your database.

edit: if you want to eliminate the "redundant" email information, you can't - you must make two distinct queries. SQL results are tables and must be rectangular, with all known values filled. There's no 'ditto' entry.

edit 2: You only have to make two queries. For instance:

SELECT subject FROM message WHERE message.id IN (SELECT user.id FROM user WHERE status=1)

This is one query that contains a nested query, so it's really making two database hits. But it doesn't have any programmatic loops.

125

answered Sep 20 '22 02:09

Borealid

Related questions
                            
                                Counting records in C# using LINQ
                            
                                Is there a way to make Identity Specification show up in the table designer next to the Allow Nulls column?
                            
                                IntegrityError: foreign key violation upon delete
                            
                                Aggregate SQL column values by time period
                            
                                SQL left join with multiple rows into one row
                            
                                How to avoid using duplicate savepoint names in nested transactions in nested stored procs?
                            
                                For SQL, when did it start to be desirable to always use the words "Inner Join" instead of implicitly joinly?
                            
                                Select return dynamic columns
                            
                                Convert historic dates from utc to local time
                            
                                How not to display columns which are NULL in a view
                            
                                Very different execution times of SQL query in C# and SQL Server Management Studio
                            
                                Lookup table for oracle decodes?
                            
                                Whats wrong with this Data Explorer SQL query?
                            
                                Select exactly one row for each employee using unordered field as criteria
                            
                                What's the most efficient way to select the last n rows in a table without changing the table's structure?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With