I have a table of over 5 million rows. When i perform a select query it is taking around 20 seconds.
SELECT CompUID,Weburl FROM `CompanyTable` WHERE (Alias1='match1' AND Alias2='match2' )OR Alias3='match3' OR Alias4='match4'
Here is the table structure:
CREATE TABLE `CompanyMaster` (
`CompUID` int(11) NOT NULL AUTO_INCREMENT,
`Weburl` varchar(150) DEFAULT NULL,
`CompanyName` varchar(200) DEFAULT NULL,
`Alias1` varchar(150) DEFAULT NULL,
`Alias2` varchar(150) DEFAULT NULL,
`Alias3` varchar(150) DEFAULT NULL,
`Alias4` varchar(150) DEFAULT NULL,
`Created` datetime DEFAULT NULL,
`LastModified` datetime DEFAULT NULL,
PRIMARY KEY (`CompUID`),
KEY `Alias` (`Alias1`,`Alias2`,`Alias3`,`Alias4`)
) ENGINE=InnoDB AUTO_INCREMENT=5457968 DEFAULT CHARSET=latin1
Here is the EXPLAIN from that query:
--------+------------------------------------------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+------+---------+------+---------+----------------------+
| 1 | SIMPLE | CompanyTable | ALL | Alias | NULL | NULL | NULL | 5255929 | Using where |
+----+-------------+----------+-------+---------------+------+---------+------+---------+----------------------+
I used the composite index Alias
(Alias1
,Alias2
,Alias3
,Alias4
).
But i believe it's not the best one. Please suggest me the right indexing for this select query lookup.
For the query engine to use a column in a composite index the columns to the left must be satisfied first. That is, columns must be used as restrictions that reduce candidate rows as read left-to-right.
The OR alias3 (or alias4) clauses violate this rule as it says "I don't care what the left parts (alias1 or alias2 (or alias3)) were, because I am not dependent upon them".
A full table scan is then required to see if there are any alias3 (or alias4) values that meet the conditions.
The potentially useful indices in this condition would be:
Actual stats and plan selection require further investigation - but at least now the query planner has something to work with.
That being said - and I'm not sure what the role of an "alias" is - it may make sense to normalize the table. The following does change the semantics slightly as it drops the "alias position" (which could be added back in) and should be verified for semantic correctness.
CREATE TABLE `CompanyMaster` (
`CompUID` int(11) NOT NULL AUTO_INCREMENT
,`CompanyName` varchar(200) DEFAULT NULL
,PRIMARY KEY (`CompUID`)
)
-- (This establishes a unique alias-per-company, which may be incorrect.)
CREATE TABLE `CompaniesAliases` (
`CompUID` int(11) NOT NULL
,`Alias` varchar(150) NOT NULL
-- Both CompUID and Alias appear in 'first' positions:
-- CompUID for Join, Alias for filter
,PRIMARY KEY (`CompUID`, `Alias`)
,KEY (`Alias`)
-- Alternative, which may change plan selection by eliminating options:
-- ,PRIMARY KEY (`Alias`, `CompUID`) -- and no single KEY/index on Alias or CompUID
,FOREIGN KEY(CompUID) REFERENCES CompanyMaster(CompUID)
)
It can then be queried approximately similar to the original, being different in that it does not care which "alias" matches which value:
-- AND constructed by joins (could also use GROUP BY .. HAVING COUNT)
SELECT c.CompUID FROM `CompanyTable` c
JOIN `CompaniesAliases` ac1
ON ac1.CompUID = c.CompUID AND Alias = 'match1'
JOIN `CompaniesAliases` ac2
ON ac2.CompUID = c.CompUID AND Alias = 'match2'
-- OR constructed by union(s)
UNION
SELECT c.CompUID FROM `CompanyTable` c
JOIN `CompaniesAliases` ac1
ON ac1.CompUID = c.CompUID AND (Alias = 'match3' OR Alias = 'match4')
I would expect such a query to be implemented efficiently in SQL Server - YMMV with MySQL.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With