In a recent question, StevieG has showed me how to resolve my problem with a pivot table. The new problem is I have to check some conditions over the pivoted table. Let's take the final query:
SELECT
c.id,
GROUP_CONCAT(if(d.name = 'p1', d.value, NULL)) AS 'p1',
GROUP_CONCAT(if(d.name = 'p2', d.value, NULL)) AS 'p2',
GROUP_CONCAT(if(d.name = 'p3', d.value, NULL)) AS 'p3',
GROUP_CONCAT(if(d.name = 'p4', d.value, NULL)) AS 'p4',
GROUP_CONCAT(if(d.name = 'p5', d.value, NULL)) AS 'p5',
GROUP_CONCAT(if(d.name = 'p6', d.value, NULL)) AS 'p6'
FROM container c
JOIN data d ON c.id = d.container
GROUP BY c.id
Obviously I cannot add a WHERE clause (for instance if I want to check if p5>30). I've found two ways to overcome this problem. First is by converting this in a derived table:
SELECT * FROM (
SELECT
c.id,
GROUP_CONCAT(if(d.name = 'p1', d.value, NULL)) AS 'p1',
GROUP_CONCAT(if(d.name = 'p2', d.value, NULL)) AS 'p2',
GROUP_CONCAT(if(d.name = 'p3', d.value, NULL)) AS 'p3',
GROUP_CONCAT(if(d.name = 'p4', d.value, NULL)) AS 'p4',
GROUP_CONCAT(if(d.name = 'p5', d.value, NULL)) AS 'p5',
GROUP_CONCAT(if(d.name = 'p6', d.value, NULL)) AS 'p6'
FROM container c
JOIN data d ON c.id = d.container
GROUP BY c.id
) WHERE p5>30
And the other way I found is add a HAVING clause:
SELECT
c.id,
GROUP_CONCAT(if(d.name = 'p1', d.value, NULL)) AS 'p1',
GROUP_CONCAT(if(d.name = 'p2', d.value, NULL)) AS 'p2',
GROUP_CONCAT(if(d.name = 'p3', d.value, NULL)) AS 'p3',
GROUP_CONCAT(if(d.name = 'p4', d.value, NULL)) AS 'p4',
GROUP_CONCAT(if(d.name = 'p5', d.value, NULL)) AS 'p5',
GROUP_CONCAT(if(d.name = 'p6', d.value, NULL)) AS 'p6'
FROM container c
JOIN data d ON c.id = d.container
GROUP BY c.id
HAVING p5>30
The problem is about performance. I'm using a test database with 50.000 entries, but production may go up to 1 milion. The first sentence (the one without the checking p5>30) takes 0'60 seconds to execute 1000 sentences in my development computer (no cache), but the second one and the third one takes more than 5 minutes to do the same.
I understand there is an implicit derived table generation without data indexes, but what are my options to optimize this?
Since data(container, name) is unique, you shouldn't need to use a GROUP_CONCAT.
What about this:
SELECT
c.id,
d_p1.value AS 'p1',
d_p2.value AS 'p2',
d_p3.value AS 'p3',
d_p4.value AS 'p4',
d_p5.value AS 'p5'
FROM container AS c
LEFT JOIN data AS d_p1 ON (d_p1.container = c.id AND d_p1.name = 'p1')
LEFT JOIN data AS d_p2 ON (d_p2.container = c.id AND d_p2.name = 'p2')
LEFT JOIN data AS d_p3 ON (d_p3.container = c.id AND d_p3.name = 'p3')
LEFT JOIN data AS d_p4 ON (d_p4.container = c.id AND d_p4.name = 'p4')
LEFT JOIN data AS d_p5 ON (d_p5.container = c.id AND d_p5.name = 'p5')
WHERE d_p5.value > 30
If there is an index on data(container, name), your query should run in seconds.
If data.name is longer than a few characters long (say 5 characters), you should probably use a surrogate (integer) key instead of data.name.
I would be close to Yak's attempt, but if you are looking ONLY for entries where the "p5.value" is greater than zero, I would restructure to only get those that HAVE a P5 as a "pre-query". if you have 100,000 records, and only 20,000 have a "P5.value" greater than your range 30, get only those first... THEN join the rest... Also, ensure you have an index on the "data" table by the "name,value" as an index... In addition, ensure an index on the "container,name"
The first pre-query will already "concatinate" the P5 values that qualified for the one container, THEN get the others as a result of the joins
select STRAIGHT_JOIN
PreQuery.QualifiedContainer ID,
coalesce( d_p1.Value, ' ' ) p1,
coalesce( d_p2.Value, ' ' ) p2,
coalesce( d_p3.Value, ' ' ) p3,
coalesce( d_p4.Value, ' ' ) p4,
PreQuery.P5Value p5,
coalesce( d_p5.Value, ' ' ) p6
from
( select
JustP5.Container as QualifiedContainer,
JustP5.Value as P5Value
from
Container JustP5
where
JustP5.Name = 'p5'
AND JustP5.Value > 30
group by
JustP5.Container ) as PreQuery
LEFT JOIN data AS d_p1
ON PreQuery.QualifiedContainer = d_p1.container
AND d_p1.name = 'p1'
LEFT JOIN data AS d_p2
ON PreQuery.QualifiedContainer = d_p2.container
AND d_p2.name = 'p2'
LEFT JOIN data AS d_p3
ON PreQuery.QualifiedContainer = d_p3.container
AND d_p3.name = 'p3'
LEFT JOIN data AS d_p4
ON PreQuery.QualifiedContainer = d_p4.container
AND d_p4.name = 'p4'
LEFT JOIN data AS d_p6
ON PreQuery.QualifiedContainer = d_p6.container
AND d_p6.name = 'p6'
No "Group By" is needed as I think, based on the other question you referenced... as you would only have once instance of a given "name/value" pair for a given container... if I am incorrect, then all I would do is change the COALESCE() to GROUP_CONCAT() and add the GROUP BY PreQuery.QualifiedContainer
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With