Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I avoid redundant data fields in the result set when using JOINs?

Following join is supposed to retrieve user info along with their messages for users with a certain status:

SELECT * FROM user, message WHERE message.user_id=user.id AND user.status=1

The problem is that all rows about a certain user in the result set contain redundant columns that repeat the same data about that user (those fields retrieved from user table), only fields from the message table contain non-redundant information. Something like this:

user.id  username email            message.id  subject
1        jane     [email protected]   120         Notification 
1        jane     [email protected]   122         Re:Hello 
1        jane     [email protected]   125         Quotation
2        john     [email protected]   127         Hi jane
2        john     [email protected]   128         Fix thiss 
2        john     [email protected]   129         Ok
3        jim      [email protected]      140         Re:Re:Quotation

As you can see many data are redundant and we do not want to first find the users and then go about their messages in loop like structures or something like that. Loops that cause micro-queries should be avoided at all costs.

I am not concerned about the output of my program, that is well handled in the UI. I think perhaps the network traffic produced by returning the result of this query could be substantially reduced if somehow I can manage to eliminate the repetition of user data in all rows pertaining to that user.

like image 768
Ashkan Kh. Nazary Avatar asked Jul 05 '10 04:07

Ashkan Kh. Nazary


People also ask

How do you avoid duplicates in join?

How do I prevent duplicate rows from joining multiple tables? Solution. Select column values in a specific order within rows to make rows with duplicate sets of values identical. Then you can use SELECT DISTINCT to remove duplicates.

How can data redundancy can be avoided?

Fortunately, data replication can help prevent data redundancy by storing the same data in multiple locations. With data replication, companies can ensure consistency and receive the information they need at any time.

Which technique is used to avoid redundancy in database design?

Normalization is an approach to database design used in relational databases to avoid redundancy.

Which of the following join will not give you duplicate results?

Explanation:OUTER JOIN is the only join which shows the unmatched rows.


1 Answers

There are several things you should know.

The first is that the default SQL JOIN construct is essentially a set cross product, restricted by the WHERE clause. This means it's multiplicative - you get duplicate results out which you then prune down. You also have to be careful in the presence of NULL fields.

The second is that there is a 'DISTINCT' keyword. When you prefix a column in the selection with this, you'll get at most one instance of a certain value for that column in the results. So, as per your query, 'SELECT DISTINCT user.id FROM' will eliminate the redundancies on the server side.

The third is that the correct way to solve this problem is likely not using the * operator. I suggest:

SELECT user.id,username,email,subject FROM message m,user WHERE m.user_id=user.id AND user.status=1

This uses the simple, easy-to-understand implicit-join syntax and should be valid SQL on whatever server. I can vouch for it working with MySQL, at least. It also aliases the 'message' table to 'm' as shorthand.

As you surmise, this will reduce the traffic from the SQL server to your database.

edit: if you want to eliminate the "redundant" email information, you can't - you must make two distinct queries. SQL results are tables and must be rectangular, with all known values filled. There's no 'ditto' entry.

edit 2: You only have to make two queries. For instance:

SELECT subject FROM message WHERE message.id IN (SELECT user.id FROM user WHERE status=1)

This is one query that contains a nested query, so it's really making two database hits. But it doesn't have any programmatic loops.

like image 125
Borealid Avatar answered Sep 20 '22 02:09

Borealid