Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL: "NOT IN" subquery optimization or alternatives

Tags:

mysql

subquery

I have two database tables: "places" and "translations". The translations of places names are made by selecting records from "places", which don't have the translations to the specified language yet:

SELECT `id`, `name`
FROM `places`
WHERE `id` NOT IN (SELECT `place_id` FROM `translations` WHERE `lang` = 'en')

This worked fine with 7 000 records of places, but crashed when the number of translations reached 5 000. Since then, the query takes about 10 seconds and returns the error:

2006 - MySQL server has gone away

As I understand, the main problem here is the subquery returning to many results, bu how could I solve it, if I need to select all the places which are not translated yet?

My plan B is to create a new boolean field in "places" table, called "translated", and reset it to "false", each time I change language - that would prevent for having subquery. However, maybe I could just modify my current SQL statement and prevent from adding additional field?

like image 272
krn Avatar asked Sep 05 '10 10:09

krn


People also ask

What can I use instead of not in SQL?

An alternative for IN and EXISTS is an INNER JOIN, while a LEFT OUTER JOIN with a WHERE clause checking for NULL values can be used as an alternative for NOT IN and NOT EXISTS.

Can you use not in with a subquery?

When using NOT IN , the subquery returns a list of zero or more values in the outer query where the comparison column does not match any of the values returned from the subquery.

What are the alternatives to using a subquery?

Using common table expressions (CTE): Common table expressions are an alternative to subqueries. You can learn more about this feature in the following article: CTEs in SQL Server; Querying Common Table Expressions.

Which of the following SQL statement Cannot be used in a subquery?

An ORDER BY command cannot be used in a subquery, although the main query can use an ORDER BY. The GROUP BY command can be used to perform the same function as the ORDER BY in a subquery. Subqueries that return more than one row can only be used with multiple value operators such as the IN operator.


1 Answers

The obvious alternative:

SELECT
  `id`, `name`
FROM
  `places`
WHERE 
  NOT EXISTS (
    SELECT 1 FROM `translations` WHERE `id` = `places`.`id` AND `lang` = 'en'
  )

There should be a clustered composite index over (translations.id, translations.lang) (composite means: a single index over multiple fields, clustered means: the index governs how the table is sorted).

like image 150
Tomalak Avatar answered Oct 02 '22 21:10

Tomalak