Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL performance of VIEW for tables combined with UNION ALL

Let's say I have 2 tables in MySQL:

create table `persons` (
    `id` bigint unsigned not null auto_increment,

    `first_name` varchar(64),
    `surname` varchar(64),

    primary key(`id`)
);

create table `companies` (
    `id` bigint unsigned not null auto_increment,

    `name` varchar(128),

    primary key(`id`)
);

Now, very often I need to treat them the same, that's why following query:

select person.id as `id`, concat(person.first_name, ' ', person.surname) as `name`, 'person' as `person_type`
from persons
union all
select company.id as `id`, company.name as `name`, 'company' as `person_type`
from companies

starts to appear in other queries quite often: as part of joins or subselects. For now, I simply inject this query into joins or subselects like:

select *
from some_table row
     left outer join (>>> query from above goes here <<<) as `persons`
     on row.person_id = persons.id and row.person_type = persons.person_type

But, today I had to use discussed union query into another query multiple times i.e. join it twice.

Since I never had experience with views and heard that they have many disadvantages, my question is:

Is it normal practice to create a view for discussed union query and use it in my joins , subselects etc? In terms of performance - will it be worse, equal or better comparing to just inserting it into joins, subselects etc? Are there any drawbacks of having a view in this case?

Thanks in advance for any help!

like image 609
Yuriy Nakonechnyy Avatar asked Mar 20 '14 20:03

Yuriy Nakonechnyy


2 Answers

I concur with all of the points in Bill Karwin's excellent answer.

Q: Is it normal practice to create a view for discussed union query and use it in my joins, subselects etc?

A: With MySQL the more normal practices is to avoid using "CREATE VIEW" statement.

Q: In terms of performance - will it be worse, equal or better comparing to just inserting it into joins, subselects etc?

A: Referencing a view object will have the identical performance to an equivalent inline view.

(There might be a teensy-tiny bit more work to lookup the view object, checking privileges, and then replace the view reference with the stored SQL, vs. sending a statement that is just a teeny-tiny bit longer. But any of those differences are insignificant.)

Q: Are there any drawbacks of having a view in this case?

A: The biggest drawback is in how MySQL processes a view, whether it's stored or inline. MySQL will always run the view query and materialize the results from that query as a temporary MyISAM table. But there's no difference there whether the view definition is stored, or whether it's included inline. (Other RDBMSs process views much differently than MySQL).

One big drawback of a view is that predicates from the outer query NEVER get pushed down into the view query. Every time you reference that view, even with a query for a single id value, MySQL is going to run the view query and create a temporary MyISAM table (with no indexes on it), and THEN MySQL will run the outer query against that temporary MyISAM table.

So, in terms of performance, think of a reference to a view on par with "CREATE TEMPORARY TABLE t (cols) ENGINE=MyISAM" and "INSERT INTO t (cols) SELECT ...".

MySQL actually refers to an inline view as a "derived table", and that name makes a lot of sense, when we understand what MySQL is doing with it.


My personal preference is to not use the "CREATE VIEW" statement. The biggest drawback (as I see it) is that it "hides" SQL that is being executed. For the future reader, the reference to the view looks like a table. And then, when he goes to write a SQL statement, he's going to reference the view like it was a table, so very convenient. Then he decides he's going to join that table to itself, with another reference to it. (For the second reference, MySQL also runs that query again, and creates yet another temporary (and unindexed) MyISAM table. And now there's a JOIN operation on that. And then a predicate "WHERE view.column = 'foo'" gets added on the outer query.

It ends up "hiding" the most obvious performance improvement, sliding that predicate into the view query.

And then, someone comes along and decides they are going to create new view, which references the old view. He only needs a subset of rows, and can't modify the existing view because that might break something, so he creates a new view... CREATE VIEW myview FROM publicview p WHERE p.col = 'foo'.

And, now, a reference to myview is going to first run the publicview query, create a temporary MyISAM table, then the myview query gets run against that, creating another temporary MyISAM table, which the outer query is going to run against.

Basically, the convenience of the view has the potential for unintentional performance problems. With the view definition available on the database for anyone to use, someone is going to use it, even where it's not the most appropriate solution.

At least with an inline view, the person writing the SQL statement is more aware of the actual SQL being executed, and having all that SQL laid out gives an opportunity for tweaking it for performance.

My two cents.

TAMING BEASTLY SQL

I find that applying regular formatting rules (that my tools automatically do) can bend monstrous SQL into something I can read and work with.

SELECT row.col1
     , row.col2
     , person.*
  FROM some_table row
  LEFT
  JOIN ( SELECT 'person'  AS `person_type`
              , p.id      AS `id`
              , CONCAT(p.first_name,' ',p.surname) AS `name`
           FROM person p
          UNION ALL
         SELECT 'company' AS `person_type`
              , c.id      AS `id`
              , c.name    AS `name`
           FROM company c
       ) person
    ON person.id = row.person_id
   AND person.person_type = row.person_type

I'd be equally likely to avoid the inline view at all, and use conditional expressions in the SELECT list, though this does get more unwieldy for lots of columns.

SELECT row.col1
     , row.col2
     , row.person_type AS ref_person_type
     , row.person_id   AS ref_person_id
     , CASE
       WHEN row.person_type = 'person'  THEN p.id 
       WHEN row.person_type = 'company' THEN c.id
       END AS `person_id`
     , CASE
       WHEN row.person_type = 'person'  THEN CONCAT(p.first_name,' ',p.surname)
       WHEN row.person_type = 'company' THEN c.name
       END AS `name`
  FROM some_table row
  LEFT
  JOIN person p
    ON row.person_type = 'person'
   AND p.id = row.person_id
  LEFT
  JOIN company c
    ON row.person_type = 'company'
   AND c.id = row.person_id
like image 174
spencer7593 Avatar answered Sep 16 '22 22:09

spencer7593


A view makes your SQL shorter. That's all.

It's a common misconception for MySQL users that views store anything. They don't (at least not in MySQL). They're more like an alias or a macro. Querying the view is most often just like running the query in the "expanded" form. Querying a view twice in one query (as in the join example you mentioned) doesn't take any advantage of the view -- it will run the query twice.

In fact, view can cause worse performance, depending on the query and how you use them, because they may need to store the result in a temporary table every time you query them.

See http://dev.mysql.com/doc/refman/5.6/en/view-algorithms.html for more details on when a view uses the temptable algorithm.

On the other hand, UNION queries also create temporary tables as they accumulate their results. So you're stuck with the cost of a temp table anyway.

like image 23
Bill Karwin Avatar answered Sep 16 '22 22:09

Bill Karwin