Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"GROUP BY" on MariaDB behaves differently from MySQL

I have been told many times that same queries MariaDB will work just the same like how it is on MySQL... until I meet this problem.

Recently, I am trying to clone an application from MySQL(InnoDB) to MariaDB(XtraDB). Although MariaDB runs MySQL queries without the need of changing anything, I was surprised to discover that the same queries actually behave quite differently on both platforms particularly in ORDER BY and GROUP BY.

For an example:

    MyTable
    =======
    +----+----------+---------------------+-----------+
    | id | parentId | creationDate        | name      |
    +----+----------+---------------------+-----------+
    | 1  | 2357     | 2017-01-01 06:03:40 | Anna      |
    +----+----------+---------------------+-----------+
    | 2  | 5480     | 2017-01-02 07:13:20 | Becky     |
    +----+----------+---------------------+-----------+
    | 3  | 2357     | 2017-01-03 08:20:12 | Christina |
    +----+----------+---------------------+-----------+
    | 4  | 2357     | 2017-01-03 08:20:15 | Dorothy   |
    +----+----------+---------------------+-----------+
    | 5  | 5480     | 2017-01-04 09:25:45 | Emma      |
    +----+----------+---------------------+-----------+
    | 6  | 1168     | 2017-01-05 10:30:10 | Fiona     |
    +----+----------+---------------------+-----------+
    | 7  | 5480     | 2017-01-05 10:33:23 | Gigi      |
    +----+----------+---------------------+-----------+
    | 8  | 1168     | 2017-01-06 12:46:34 | Heidi     |
    +----+----------+---------------------+-----------+
    | 9  | 1168     | 2017-01-06 12:46:34 | Irene     |
    +----+----------+---------------------+-----------+
    | 10 | 2357     | 2017-01-07 14:58:37 | Jane      |
    +----+----------+---------------------+-----------+
    | 11 | 2357     | 2017-01-07 14:58:37 | Katy      |
    +----+----------+---------------------+-----------+

Basically what I want to get from a query is the latest records from each GROUPing (i.e. parentId). By latest, I mean MAX(creationDate) and MAX(id)

So, for the above example, since there are only three different parentId values, I am hoping to get:

    +----+----------+---------------------+-----------+
    | id | parentId | creationDate        | name      |
    +----+----------+---------------------+-----------+
    | 11 | 2357     | 2017-01-07 14:58:37 | Katy      |
    +----+----------+---------------------+-----------+
    | 9  | 1168     | 2017-01-06 12:46:34 | Irene     |
    +----+----------+---------------------+-----------+
    | 7  | 5480     | 2017-01-05 10:33:23 | Gigi      |
    +----+----------+---------------------+-----------+

Originally the application has queries similar to this fashion:

SELECT * FROM
  ( SELECT * FROM `MyTable` WHERE `parentId` IN (...)
    ORDER BY `creationDate` DESC, `id` DESC ) AS `t` 
  GROUP BY `parentId`;

On MySQL, this works, since the inner query will order and then the outer query gets the first of each GROUP from the result of the inner query. The outer query basically obeys ordering of the inner query.

But on MariaDB, the outer query will ignore the ordering of the inner query result. I get this on MariaDB instead:

    +----+----------+---------------------+-----------+
    | id | parentId | creationDate        | name      |
    +----+----------+---------------------+-----------+
    | 1  | 2357     | 2017-01-01 06:03:40 | Anna      |
    +----+----------+---------------------+-----------+
    | 2  | 5480     | 2017-01-02 07:13:20 | Becky     |
    +----+----------+---------------------+-----------+
    | 6  | 1168     | 2017-01-05 10:30:10 | Fiona     |
    +----+----------+---------------------+-----------+

To achieve the same behaviour on MariaDB, I have come up with something like this. (Not sure if this is accurate though.)

SELECT `t1`.* FROM `MyTable` `t1` LEFT JOIN `MyTable` `t2` ON (
        `t1`.`parentId` = `t2`.`parentId`
    AND `t2`.`parentId` IN (...)
    AND `t1`.`creationDate` <= `t2`.`creationDate`
    AND `t1`.`id` < `t2`.`id`)
  ) WHERE `t2`.`id` IS NULL;

Now the problem is... If I am going to rewrite the queries, I have to rewrite hundreds of them... and they are some how a little bit different from each other.

I wonder if anyone here have any ideas that would allow me to make the least changes possible.

Thank you all in advance.

like image 790
user2526586 Avatar asked Feb 18 '17 07:02

user2526586


People also ask

What are two differences between MariaDB and MySQL?

MariaDB vs MySQL Differences Even though MariaDB is a fork of MySQL, these two database management systems are still quite different: MariaDB is fully GPL licensed while MySQL takes a dual-license approach. Each handle thread pools in a different way. MariaDB supports a lot of different storage engines.

Is MariaDB syntax different from MySQL?

Because MariaDB is a fork from MySQL, the syntax is similar, but MariaDB has several other features. Basic SQL syntax is the same, but the way MariaDB stores data or handles functions is different . Each new version of MariaDB also has added features.

Does MariaDB use the same commands as MySQL?

Being relational database management systems, MySQL and MariaDB both support SQL. As we have already mentioned, MariaDB is a fork of MySQL, so, in fact, there are not many syntax differences between MySQL and MariaDB. For instance, such MySQL and MariaDB commands as SELECT, INSERT, and UPDATE are identical.

Is MariaDB as good as MySQL?

When it comes to performing queries or replication, MariaDB is faster than MySQL. So if you need a high-performance relational database solution, MariaDB is a good choice. In addition, MariaDB also easily supports a high concurrent number of connections without much performance degradation.


2 Answers

Yeah, this is a link-only answer. But the links are to the MariaDB site.

Here is another discussion of the 'incompatibility': https://mariadb.com/kb/en/mariadb/group-by-trick-has-been-optimized-away/

Technically, speaking, MySQL implemented an extension to the the Ansi standard. Much later, it decided to remove it, so I think you will find that MySQL has migrated toward MariaDB.

Here is list of "fast" ways to do group-wise max, which is probably what you are trying to do: https://mariadb.com/kb/en/mariadb/groupwise-max-in-mariadb/

like image 69
Rick James Avatar answered Oct 17 '22 14:10

Rick James


Your first query would probably work in MySQL but its behavior is not documented: you are grouping by groupid but you are selecting non-aggregated columns with * and the value of any of those non-aggregated columns is undefined - if the value you get is the first value encountered it's just a "matter of luck".

It is true that, even if it cannot be considered correct, on MySQL I have never seen this "trick" fail (and here on stackoverflow there are plenty of upvoted answers suggesting you to use this trick), but MariaDB uses a different optimization engine and you cannot rely on MySQL undocumented behavior.

Your second query needs a little adjustment:

and (
  `t1`.`creationDate` < `t2`.`creationDate`
  or (
    `t1`.`creationDate` = `t2`.`creationDate`
     and `t1`.`id` < `t2`.`id`
  )
)

because first you are ordering by creation date, then if more than one record share the same creation date you are getting the one with the highest id.

There are other ways to write the same query, e.g.

select * from mytable
where id in (
  select max(m.id)
  from mytable m inner join (
    select parentID, max(creationDate) as max_cd
    from mytable
    group by ParentID
  ) t on m.parentID = t.parentID and m.creationDate = t.max_cd
  group by m.parentID, m.creationDate
)

but every query needs to be rewritten separately.

Edit

Your example is a little more complicated because you are ordering by both creationDate and id. Let me explain better. First thing to do, for every parentID you have to get the last creationDate:

select parentID, max(creationDate) as max_cd
from MyTable
group by parentID

then for every max creationDate you have to get the highest id:

select t.parentID, t.max_cd, max(t.id) as max_id
from
  MyTable t inner join (  
    select parentID, max(creationDate) as max_cd
    from MyTable
    group by parentID
  ) t1 on t.parentID = t1.parentID and t.creationDate = t1.max_cd
group t.parentID, t.max_cd

then you have to get all records where the id are returned by this query. In this particular context a LEFT JOIN with the table itself should be easier to write and more performant.

like image 32
fthiella Avatar answered Oct 17 '22 15:10

fthiella