How to optimize conditional checking over a pivoted table?

Question

In a recent question, StevieG has showed me how to resolve my problem with a pivot table. The new problem is I have to check some conditions over the pivoted table. Let's take the final query:

SELECT 
  c.id, 
  GROUP_CONCAT(if(d.name = 'p1', d.value, NULL)) AS 'p1', 
  GROUP_CONCAT(if(d.name = 'p2', d.value, NULL)) AS 'p2', 
  GROUP_CONCAT(if(d.name = 'p3', d.value, NULL)) AS 'p3', 
  GROUP_CONCAT(if(d.name = 'p4', d.value, NULL)) AS 'p4', 
  GROUP_CONCAT(if(d.name = 'p5', d.value, NULL)) AS 'p5', 
  GROUP_CONCAT(if(d.name = 'p6', d.value, NULL)) AS 'p6'
FROM container c
JOIN data d ON c.id = d.container
GROUP BY c.id

Obviously I cannot add a WHERE clause (for instance if I want to check if p5>30). I've found two ways to overcome this problem. First is by converting this in a derived table:

SELECT * FROM (
    SELECT 
      c.id, 
      GROUP_CONCAT(if(d.name = 'p1', d.value, NULL)) AS 'p1', 
      GROUP_CONCAT(if(d.name = 'p2', d.value, NULL)) AS 'p2', 
      GROUP_CONCAT(if(d.name = 'p3', d.value, NULL)) AS 'p3', 
      GROUP_CONCAT(if(d.name = 'p4', d.value, NULL)) AS 'p4', 
      GROUP_CONCAT(if(d.name = 'p5', d.value, NULL)) AS 'p5', 
      GROUP_CONCAT(if(d.name = 'p6', d.value, NULL)) AS 'p6'
    FROM container c
    JOIN data d ON c.id = d.container
    GROUP BY c.id
) WHERE p5>30

And the other way I found is add a HAVING clause:

SELECT 
  c.id, 
  GROUP_CONCAT(if(d.name = 'p1', d.value, NULL)) AS 'p1', 
  GROUP_CONCAT(if(d.name = 'p2', d.value, NULL)) AS 'p2', 
  GROUP_CONCAT(if(d.name = 'p3', d.value, NULL)) AS 'p3', 
  GROUP_CONCAT(if(d.name = 'p4', d.value, NULL)) AS 'p4', 
  GROUP_CONCAT(if(d.name = 'p5', d.value, NULL)) AS 'p5', 
  GROUP_CONCAT(if(d.name = 'p6', d.value, NULL)) AS 'p6'
FROM container c
JOIN data d ON c.id = d.container
GROUP BY c.id
HAVING p5>30

The problem is about performance. I'm using a test database with 50.000 entries, but production may go up to 1 milion. The first sentence (the one without the checking p5>30) takes 0'60 seconds to execute 1000 sentences in my development computer (no cache), but the second one and the third one takes more than 5 minutes to do the same.

I understand there is an implicit derived table generation without data indexes, but what are my options to optimize this?

RandomSeed · Accepted Answer

Since data(container, name) is unique, you shouldn't need to use a GROUP_CONCAT. What about this:

SELECT 
  c.id, 
  d_p1.value AS 'p1', 
  d_p2.value AS 'p2', 
  d_p3.value AS 'p3', 
  d_p4.value AS 'p4', 
  d_p5.value AS 'p5'
FROM container AS c
LEFT JOIN data AS d_p1 ON (d_p1.container = c.id AND d_p1.name = 'p1')
LEFT JOIN data AS d_p2 ON (d_p2.container = c.id AND d_p2.name = 'p2')
LEFT JOIN data AS d_p3 ON (d_p3.container = c.id AND d_p3.name = 'p3')
LEFT JOIN data AS d_p4 ON (d_p4.container = c.id AND d_p4.name = 'p4')
LEFT JOIN data AS d_p5 ON (d_p5.container = c.id AND d_p5.name = 'p5')
WHERE d_p5.value > 30

If there is an index on data(container, name), your query should run in seconds.

If data.name is longer than a few characters long (say 5 characters), you should probably use a surrogate (integer) key instead of data.name.

DRapp · Answer

I would be close to Yak's attempt, but if you are looking ONLY for entries where the "p5.value" is greater than zero, I would restructure to only get those that HAVE a P5 as a "pre-query". if you have 100,000 records, and only 20,000 have a "P5.value" greater than your range 30, get only those first... THEN join the rest... Also, ensure you have an index on the "data" table by the "name,value" as an index... In addition, ensure an index on the "container,name"

The first pre-query will already "concatinate" the P5 values that qualified for the one container, THEN get the others as a result of the joins

select STRAIGHT_JOIN
      PreQuery.QualifiedContainer ID,
      coalesce( d_p1.Value, ' ' ) p1,
      coalesce( d_p2.Value, ' ' ) p2,
      coalesce( d_p3.Value, ' ' ) p3,
      coalesce( d_p4.Value, ' ' ) p4,
      PreQuery.P5Value  p5,
      coalesce( d_p5.Value, ' ' ) p6
   from
      ( select 
              JustP5.Container as QualifiedContainer,
              JustP5.Value as P5Value
           from
              Container JustP5
           where
                  JustP5.Name = 'p5'
              AND JustP5.Value > 30 
           group by
              JustP5.Container ) as PreQuery

         LEFT JOIN data AS d_p1 
            ON PreQuery.QualifiedContainer = d_p1.container
           AND d_p1.name = 'p1'

         LEFT JOIN data AS d_p2
            ON PreQuery.QualifiedContainer = d_p2.container
           AND d_p2.name = 'p2'

         LEFT JOIN data AS d_p3
            ON PreQuery.QualifiedContainer = d_p3.container
           AND d_p3.name = 'p3'

         LEFT JOIN data AS d_p4
            ON PreQuery.QualifiedContainer = d_p4.container
           AND d_p4.name = 'p4'

         LEFT JOIN data AS d_p6
            ON PreQuery.QualifiedContainer = d_p6.container
           AND d_p6.name = 'p6'

No "Group By" is needed as I think, based on the other question you referenced... as you would only have once instance of a given "name/value" pair for a given container... if I am incorrect, then all I would do is change the COALESCE() to GROUP_CONCAT() and add the GROUP BY PreQuery.QualifiedContainer

How to optimize conditional checking over a pivoted table?

Tags:

mysql

query-optimization

Ivan

2 Answers

RandomSeed

DRapp

Recent Activity

Donate For Us

How to optimize conditional checking over a pivoted table?

Tags:

mysql

query-optimization

Ivan

2 Answers

RandomSeed

DRapp

Related questions

Recent Activity

Donate For Us