Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to get distinct values from large table

I have a db table with about 10 or so columns, two of which are month and year. The table has about 250k rows now, and we expect it to grow by about 100-150k records a month. A lot of queries involve the month and year column (ex, all records from march 2010), and so we frequently need to get the available month and year combinations (ie do we have records for april 2010?).

A coworker thinks that we should have a separate table from our main one that only contains the months and years we have data for. We only add records to our main table once a month, so it would just be a small update on the end of our scripts to add the new entry to this second table. This second table would be queried whenever we need to find the available month/year entries on the first table. This solution feels kludgy to me and a violation of DRY.

What do you think is the correct way of solving this problem? Is there a better way than having two tables?

like image 243
derivation Avatar asked Apr 21 '10 18:04

derivation


People also ask

Which is the most effective option to find unique records in a table?

The easiest way to identify unique and distinct values in Excel is by using the IF function together with COUNTIF.

How can I make SELECT distinct faster?

You probably don't want to hear this, but the best option to speed up SELECT DISTINCT is to avoid DISTINCT to begin with. In many cases (not all!) it can be avoided with better database-design or better queries. Sometimes, GROUP BY is faster, because it takes a different code path.


2 Answers

Using a simple index on the columns required (Year and Month) should greatly improve either a DISTINCT, or GROUP BY Query.

I would not go with a secondary table as this adds extra over head to maintaining the secondary table (inserts/updates deletes will require that you validate the secondary table)

EDIT:

You might even want to consider using Improving Performance with SQL Server 2005 Indexed Views

like image 146
Adriaan Stander Avatar answered Dec 25 '22 03:12

Adriaan Stander


Make sure to have an Clustered Index on those columns. and partition your table on these date columns an place the datafiles on different disk drives I Believe keeping your index fragmentation low is your best shot.

I also Believe having a physical view with the desired select is not a good idea, because it adds Insert/Update overhead. on average there's 3,5 insert's per minute. or about 17 seconds between each insert (on average please correct me if I'm wrong)

The question is are you selecting more often than every 17 seconds? That's the key thought. Hope it helped.

like image 26
Gabriel Guimarães Avatar answered Dec 25 '22 01:12

Gabriel Guimarães