60 million entries, select entries from a certain month. How to optimize database?

Question

I have a database with 60 million entries.

Every entry contains:

ID
DataSourceID
Some Data
DateTime

I need to select entries from certain month. Each month contains approximately 2 million entries.
```
 select * 
   from Entries 
  where time between "2010-04-01 00:00:00" and "2010-05-01 00:00:00"
```
(query takes approximately 1.5 minutes)
I'd also like to select data from certain month from a given DataSourceID. (takes approximately 20 seconds)

There are about 50-100 different DataSourceIDs.

Is there a way to make this faster? What are my options? How to optimize this database/query?

EDIT: There's approx. 60-100 inserts PER second!

OMG Ponies · Accepted Answer

To get entries in a particular month, for a particular year, faster - you will need to index the time column:

CREATE INDEX idx_time ON ENTRIES(time) USING BTREE;

Additionally, use:

SELECT e.* 
  FROM ENTRIES e
 WHERE e.time BETWEEN '2010-04-01' AND DATE_SUB('2010-05-01' INTERVAL 1 SECOND)

...because BETWEEN is inclusive, so you'd get anything dated "2010-05-01 00:00:00" with the query you posted.

I'd also like to select data from certain month from a given DataSourceID

You can either add a separate index for the datasourceid column:

CREATE INDEX idx_time ON ENTRIES(datasourceid) USING BTREE;

...or setup a covering index to include both columns:

CREATE INDEX idx_time ON ENTRIES(time, datasourceid) USING BTREE;

A covering index requires that the leftmost columns have to be used in the query for the index to be used. In this example, having time first will work for both situations you mentioned -- datasourceid doesn't have to be used for the index to be of use. But, you have to test your queries by viewing the EXPLAIN output to really know what works best for your data & the queries being performed on that data.

That said, indexes will slow down INSERT, UPDATE and DELETE statements. And an index doesn't provide a lot of value if the column data is has few distinct values - IE: a boolean column is a bad choice to index, because the cardinality is low.

60 million entries, select entries from a certain month. How to optimize database?

Tags:

sql

mysql

query-optimization

JBeurer

1 Answers

I'd also like to select data from certain month from a given DataSourceID

OMG Ponies

Recent Activity

Donate For Us

60 million entries, select entries from a certain month. How to optimize database?

Tags:

sql

mysql

query-optimization

JBeurer

1 Answers

I'd also like to select data from certain month from a given DataSourceID

OMG Ponies

Related questions

Recent Activity

Donate For Us