Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Question about joins and table with Millions of rows

I have to create 2 tables:

Magazine ( 10 millions of rows with these columns: id, title, genres, printing, price )

Author ( 180 millions of rows with these columns: id, name, magazine_id )

. Every author can write on ONLY ONE magazine and every magazine has more authors.

So if I want to know all authors of Motors Magazine, I have to use this query:

SELECT * FROM Author, Magazine WHERE ( Author.magazine_id = Magazine.id ) AND ( genres = 'Motors' )

The same applies to Printing and Price column.

To avoid these joins with tables of millions of rows, I thought to use this tables:

Magazine ( 10 millions of rows with this column: id, title, genres, printing, price )

Author ( 180 millions of rows with this column: id, name, magazine_id, genres, printing, price )

. and this query:

SELECT * FROM Author WHERE  genres = 'Motors' 

Is it a good approach ?

I want to make it run faster

I can use Postgresql or Mysql.

like image 663
xRobot Avatar asked May 01 '10 19:05

xRobot


2 Answers

No, I don't think duplicating the information as you describe is a good design for a relational database.

If you change the genre or price of a given magazine, you would have to remember to change it in all the author rows where the information is duplicated. And if you forget sometimes, you end up with anomalies in your data. How can you know which one is correct?

This is one of the benefits of relational database normalization, to represent information with minimal redundancy, so you don't get anomalies.

To make it run faster, which is I think what you're trying to do, you should learn how to use indexes, especially covering indexes.

like image 79
Bill Karwin Avatar answered Nov 03 '22 07:11

Bill Karwin


If you only need to get the Authors of a Magazine (and no information about the Magazine) you can use EXISTS. Some say EXISTS are faster than JOIN because an EXISTS stops the search after the first hit. Then you should use:

SELECT *
FROM Author
WHERE EXISTS (SELECT 1 FROM Magazine WHERE genres = 'Motor' AND Author.id = Magazine.id)

Also, as mentioned before, specifying the columns would speed things up.

like image 41
Lars Nyström Avatar answered Nov 03 '22 08:11

Lars Nyström