Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a simpler way to find MODE(S) of some values in MySQL

Tags:

mysql

mode

MODE is the value that occurs the MOST times in the data, there can be ONE MODE or MANY MODES

here's some values in two tables (sqlFiddle)

create table t100(id int auto_increment primary key, value int);
create table t200(id int auto_increment primary key, value int);

insert into t100(value) values (1),
                               (2),(2),(2),
                               (3),(3),
                               (4);
insert into t200(value) values (1),
                               (2),(2),(2),
                               (3),(3),
                               (4),(4),(4);

right now, to get the MODE(S) returned as comma separated list, I run the below query for table t100

     SELECT GROUP_CONCAT(value) as modes,occurs
     FROM
        (SELECT value,occurs FROM 
           (SELECT value,count(*) as occurs
            FROM
            T100
            GROUP BY value)T1,
        (SELECT max(occurs) as maxoccurs FROM 
            (SELECT value,count(*) as occurs
             FROM
             T100
             GROUP BY value)T2
        )T3
        WHERE T1.occurs = T3.maxoccurs)T4
      GROUP BY occurs;

and the below query for table t200 (same query just with table name changed) I have 2 tables in this example because to show that it works for cases where there's 1 MODE and where there are multiple MODES.

     SELECT GROUP_CONCAT(value) as modes,occurs
     FROM
        (SELECT value,occurs FROM 
           (SELECT value,count(*) as occurs
            FROM
            T200
            GROUP BY value)T1,
        (SELECT max(occurs) as maxoccurs FROM 
            (SELECT value,count(*) as occurs
             FROM
             T200
             GROUP BY value)T2
        )T3
        WHERE T1.occurs = T3.maxoccurs)T4
      GROUP BY occurs;

My question is "Is there a simpler way?"

I was thinking like using HAVING count(*) = max(count(*)) or something similar to get rid of the extra join but couldn't get HAVING to return the result i wanted.

UPDATED: as suggested by @zneak, I can simplify T3 like below:

     SELECT GROUP_CONCAT(value) as modes,occurs
     FROM
        (SELECT value,occurs FROM 
           (SELECT value,count(*) as occurs
            FROM
            T200
            GROUP BY value)T1,
        (SELECT count(*) as maxoccurs
             FROM
             T200
             GROUP BY value
             ORDER BY count(*) DESC
             LIMIT 1
        )T3
        WHERE T1.occurs = T3.maxoccurs)T4
      GROUP BY occurs;

Now is there a way to get ride of T3 altogether? I tried this but it returns no rows for some reason

  SELECT value,occurs FROM  
    (SELECT value,count(*) as occurs
     FROM t200
     GROUP BY `value`)T1
  HAVING occurs=max(occurs)  

basically I am wondering if there's a way to do it such that I only need to specify t100 or t200 once.

UPDATED: i found a way to specify t100 or t200 only once by adding a variable to set my own maxoccurs like below

  SELECT GROUP_CONCAT(CASE WHEN occurs=@maxoccurs THEN value ELSE NULL END) as modes 
  FROM 
    (SELECT value,occurs,@maxoccurs:=GREATEST(@maxoccurs,occurs) as maxoccurs
     FROM (SELECT value,count(*) as occurs
           FROM t200
           GROUP BY `value`)T1,(SELECT @maxoccurs:=0)mo
     )T2
like image 479
Tin Tran Avatar asked Mar 21 '23 06:03

Tin Tran


1 Answers

You are very close with the last query. The following finds one mode:

SELECT value, occurs
FROM (SELECT value,count(*) as occurs
      FROM t200
      GROUP BY `value`
      LIMIT 1
     ) T1

I think your question was about multiple modes, though:

SELECT value, occurs
FROM (SELECT value, count(*) as occurs
      FROM t200
      GROUP BY `value`
     ) T1
WHERE occurs = (select max(occurs)
                from (select `value`, count(*) as occurs
                      from t200
                      group by `value`
                     ) t
               );

EDIT:

This is much easier in almost any other database. MySQL supports neither with nor window/analytic functions.

Your query (shown below) does not do what you think it is doing:

  SELECT value, occurs  
  FROM (SELECT value, count(*) as occurs
        FROM t200
        GROUP BY `value`
       ) T1
  HAVING occurs = max(occurs) ; 

The final having clause refers to the variable occurs but does use max(occurs). Because of the use of max(occurs) this is an aggregation query that returns one row, summarizing all rows from the subquery.

The variable occurs is not using for grouping. So, what value does MySQL use? It uses an arbitrary value from one of the rows in the subquery. This arbitrary value might match, or it might not. But, the value only comes from one row. There is no iteration over it.

like image 176
Gordon Linoff Avatar answered Apr 26 '23 12:04

Gordon Linoff