Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I return the numerical boxplot data of all results using 1 mySQL query?

[tbl_votes]
- id <!-- unique id of the vote) -->
- item_id <!-- vote belongs to item <id> -->
- vote <!-- number 1-10 -->

Of course we can fix this by getting:

  • the smallest observation (so)
  • the lower quartile (lq)
  • the median (me)
  • the upper quartile (uq)
  • and the largest observation (lo)

..one-by-one using multiple queries but I am wondering if it can be done with a single query.

In Oracle I can use COUNT OVER and RATIO_TO_REPORT, but this is not supported in mySQL.

For those who don't know what a boxplot is: http://en.wikipedia.org/wiki/Box_plot

Any help would be appreciated.

like image 920
Wouter Dorgelo Avatar asked Dec 26 '11 20:12

Wouter Dorgelo


1 Answers

I've found a solution in PostgreSQL using using PL/Python.

However, I leave the question open in case someone else comes up with a solution in mySQL.

CREATE TYPE boxplot_values AS (
  min       numeric,
  q1        numeric,
  median    numeric,
  q3        numeric,
  max       numeric
);

CREATE OR REPLACE FUNCTION _final_boxplot(strarr numeric[])
   RETURNS boxplot_values AS
$$
    x = strarr.replace("{","[").replace("}","]")
    a = eval(str(x))

    a.sort()
    i = len(a)
    return ( a[0], a[i/4], a[i/2], a[i*3/4], a[-1] )
$$
LANGUAGE 'plpythonu' IMMUTABLE;

CREATE AGGREGATE boxplot(numeric) (
  SFUNC=array_append,
  STYPE=numeric[],
  FINALFUNC=_final_boxplot,
  INITCOND='{}'
);

Example:

SELECT customer_id as cid, (boxplot(price)).*
FROM orders
GROUP BY customer_id;

   cid |   min   |   q1    | median  |   q3    |   max
-------+---------+---------+---------+---------+---------
  1001 | 7.40209 | 7.80031 |  7.9551 | 7.99059 | 7.99903
  1002 | 3.44229 | 4.38172 | 4.72498 | 5.25214 | 5.98736

Source: http://www.christian-rossow.de/articles/PostgreSQL_boxplot_median_quartiles_aggregate_function.php

like image 158
Wouter Dorgelo Avatar answered Sep 25 '22 19:09

Wouter Dorgelo