Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Querying Large Dataset on Join (15+ million rows)

Tags:

mysql

mariadb

I am trying to join two tables, products and products_markets. While products is under a million records, product_markets is closer to 20 million records. The data has been changed so there might be a typo or two in the schema create tables:

CREATE TABLE `products_markets` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `product_id` int(10) unsigned NOT NULL,
  `country_code_id` int(10) unsigned NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  UNIQUE KEY `unique_index` (`product_id`,`country_code_id`)
) ENGINE=InnoDB AUTO_INCREMENT=21052102 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

CREATE TABLE `products` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `manufacturer_id` int(10) unsigned NOT NULL,
  `department_id` int(10) unsigned NOT NULL,
  `code` varchar(100) COLLATE utf8mb4_unicode_ci NOT NULL,
  `popularity` int(11) DEFAULT NULL,
  `name` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  `value` bigint(20) unsigned NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  UNIQUE KEY `products_code_unique` (`code`),
  KEY `products_department_id_foreign` (`department_id`),
  KEY `products_manufacturer_id_foreign` (`manufacturer_id`),
  CONSTRAINT `products_department_id_foreign`
       FOREIGN KEY (`department_id`) REFERENCES `departments` (`id`),
  CONSTRAINT `products_manufacturer_id_foreign`
       FOREIGN KEY (`manufacturer_id`) REFERENCES `manufacturers` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=731563 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

I am trying to return 50 records of the most popular products available in a specific country and I am running into times around ~50 seconds which seems higher than expected.

I've tried a few different queries with no success:

select  `products_markets`.`product_id`
    from  products_markets
    left join  
        ( SELECT  products.id, products.popularity
            from  products
        ) p  ON p.id = products_markets.product_id
    where products_markets.country_code_id = 121
    order by  `popularity` desc, `p`.`id` asc
    limit  50 

and

select  `products`.*
    from  `products`
    where  products.id in (
        SELECT  product_id
            from  products_markets
            where  products_markets.country_code_id = 121
                          )
    group by  `products`.`name`, `products`.`manufacturer_id`
    order by  `popularity` desc, `products`.`id` asc
    limit  50 

this query's explain is:

id  select_type  table              type possible_keys key           key_len refs             rows              extra
1   PRIMARY      products           ALL  PRIMARY       NULL          NULL    NULL             623848            Using temporary; Using filesort
1   PRIMARY      products_markets   ref  unique_index  unique_index  4       main.products.id 14                Using where; Using index; FirstMatch(products)

One option I am entertaining is splitting up products_markets into individual tables for each country to lessen the query. I've tried adding more memory to the server without much success. Can anyone identify anything glaringly wrong with the database design/query?

What other options are available to make this query a fraction of its current ~50 seconds?

like image 649
Alex Harris Avatar asked Dec 07 '25 06:12

Alex Harris


1 Answers

Get rid of id in products_markets and add

PRIMARY KEY(country_code_id, product_id)

Then get rid of the UNIQUE key unless it is needed for some other query.

This will shrink the disk footprint of that large table significantly, thereby potentially speeding up all queries touching it.

And it will help with Hamaza's suggested reformulation.

like image 184
Rick James Avatar answered Dec 09 '25 18:12

Rick James