Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A simple MySQL query taking forever (more than 20 minutes!)

Tags:

sql

mysql

Maybe you can help me. I need to query 3 tables in order to get data for a financial stock.

The idea is to go to the instruments table, find the index for each instrument and then bring all the prices for that particular instrument together with the indicators that are on a separate table.

Tables stockdata and indicators are both almost 50.000 records. instruments just 30.

This is the query that is not working:

SELECT
  indicators.ddate,
  instruments.name,
  indicators.sma_14,
  indicators.sma_5,
  stockdata.close
FROM
 indicators
 INNER JOIN instruments ON (indicators.instrument_id=instruments.id)
 INNER JOIN stockdata ON (instruments.name=stockdata.name)

Here is the EXPLAIN result

+----+-------------+-------------+-------+-----------------------------+---------------------+---------+------+-------------+
| id | select_type | table       | type  | possible_keys               | key                 | key_len | rows | Extra       |
+----+-------------+-------------+-------+-----------------------------+---------------------+---------+------+-------------+
| 1  | SIMPLE      | instruments | index | PRIMARY,instruments_index01 | instruments_index01 |      61 |   25 | Using index |
| 1  | SIMPLE      | indicators  | ref   | indicators_index01          | indicators_index01  |       5 |  973 | Using where |
| 1  | SIMPLE      | stockdata   | ref   | stockdata_index01           | stockdata_index01   |      31 | 1499 | Using where |
+----+-------------+-------------+-------+-----------------------------+---------------------+---------+------+-------------+

I really appreciate any help you can provide!

This is the schema for the parts of the tables that are involved in my question:

TABLE `indicators` (
  `id`             int AUTO_INCREMENT NOT NULL,<br>
  `instrument_id`  int,
  `date`           date,
  `sma_5`          float(10,3),
  `sma_14`         float(10,3),
  `ema_14`         float(10,3),
  /* Keys */
  PRIMARY KEY (`id`)
)

TABLE `instruments` (
  `id`         int AUTO_INCREMENT NOT NULL,
  `name`       char(20),
  `country`    char(50),
  `newsquery`  char(100),
  /* Keys */
  PRIMARY KEY (`id`)
)

TABLE `stockdata` (
  `id`        int AUTO_INCREMENT NOT NULL,
  `name`      char(10),
  `date`      date,
  `open`      float,
  `high`      float,
  `low`       float,
  `close`     float,
  `volume`    int,
  `adjclose`  float,
  /* Keys */
  PRIMARY KEY (`id`)
)
like image 603
JordanBelf Avatar asked Jul 27 '10 03:07

JordanBelf


1 Answers

You are joining the indicators table to the instruments table, and the indicators.instrument_id column is not indexed.

You are also joining the instruments table to the stockdata table using the instruments.name and stockdata.name columns, both of which are type CHAR. Joining using CHAR or VARCHAR is usually significantly slower than joining using INT columns:

Using CHAR keys for joins, how much is the overhead?

To make matters worse, your CHAR columns are different sizes (char(20) and char(10) respectively), and they are not indexed. This really makes things difficult for MySQL! See How MySQL Uses Indexes for more information.

Ideally, you should alter your table structure so that the join can be performed using indexed INT fields. Something like this:

CREATE TABLE `instruments` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` char(20) DEFAULT NULL,
  `country` char(50) DEFAULT NULL,
  `newsquery` char(100) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;

CREATE TABLE `indicators` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `instrument_id` int(11) DEFAULT NULL,
  `date` date DEFAULT NULL,
  `sma_5` float(10,3) DEFAULT NULL,
  `sma_14` float(10,3) DEFAULT NULL,
  `ema_14` float(10,3) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `fk_instrument_indicators` (`instrument_id`),
  CONSTRAINT `fk_instrument_indicators` FOREIGN KEY (`instrument_id`) REFERENCES `instruments` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB;

 CREATE TABLE `stockdata` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `instrument_id` int(11) NOT NULL,
  `name` char(20) DEFAULT NULL,
  `date` date DEFAULT NULL,
  `open` float DEFAULT NULL,
  `high` float DEFAULT NULL,
  `low` float DEFAULT NULL,
  `close` float DEFAULT NULL,
  `volume` int(11) DEFAULT NULL,
  `adjclose` float DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `fk_instrument_stockdata` (`instrument_id`),
  CONSTRAINT `fk_instrument_stockdata` FOREIGN KEY (`instrument_id`) REFERENCES `instruments` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB;

Then use the indexed fields in your join:

SELECT
  indicators.date,
  instruments.name,
  indicators.sma_14,
  indicators.sma_5,
  stockdata.close
FROM
 indicators
 INNER JOIN instruments ON (indicators.instrument_id=instruments.id)
 INNER JOIN stockdata ON (instruments.id=stockdata.instrument_id)

By using indexed INT columns, your joins will be much faster. Using InnoDB constraints will help ensure data integrity.

If there is a reason why you must join on the name column, make both the same size, and index them.

like image 124
Mike Avatar answered Sep 23 '22 23:09

Mike