I'm running a MYSQL query in two steps. First, I get a list of ids with one query, and then I retrieve the data for those ids using a second query along the lines of SELECT * FROM data WHERE id in (id1, id2 ...)
. I know it sounds hacky, but I've done it this way as the queries are very complicated; the first involves lots of geometry and triggernometry, the second one lots of different joins. I'm sure they could be written in a single query, but my MYSQL isn't good enough to pull it off.
This approach works, but it doesn't feel right; plus I'm concerned it won't scale. At the moment I am testing on a database of 10,000 records, with 400 ids in the "IN" clause ( i.e. IN (id1, id2 ... id400)
) and performance is fine. But what if there are say 1,000,000 records?
Where are the performance bottlenecks (speed, memory, etc) for this kind of query? Any ideas for how to refactor this kind of query for be awesome too. (for example, if it is worth swotting up on stored procedures).
Software MySQL Performance Tuning. SQL performance tuning is the process of maximizing query speeds on a relational database. The task usually involves multiple tools and techniques.
Make Sure You Are Using The Latest MySQL Version But if it is, check on the latest version of MySQL and upgrade to the latest (most recent NoSQL databases). Newer versions come with performance improvements by default, saving you the need to find further solutions to common performance tuning in MySQL issues.
Queries can become slow for various reasons ranging from improper index usage to bugs in the storage engine itself. However, in most cases, queries become slow because developers or MySQL database administrators neglect to monitor them and keep an eye on their performance.
Starting from a certain number of records, the IN
predicate over a SELECT
becomes faster than that over a list of constants.
See this article in my blog for performance comparison:
If the column used in the query in the IN
clause is indexed, like this:
SELECT * FROM table1 WHERE unindexed_column IN ( SELECT indexed_column FROM table2 )
, then this query is just optimized to an EXISTS
(which uses but a one entry for each record from table1
)
Unfortunately, MySQL
is not capable of doing HASH SEMI JOIN
or MERGE SEMI JOIN
which are yet more efficient (especially if both columns are indexed).
Why do you extract the ids first? You should probably just join the tables. If you use the ids for something else, you can insert them in a temp table before and use this table for the join.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With