Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trimmed mean calculation in MySQL

I want to write a function that calculates a simple trimmed mean calculation in MySQL. The function will (obviously) be an aggregate function. I am new to writing functions etc in MySQL so could do with some help.

The algorithm of the trimmed mean will be as follows (pseudocode):

CREATE AGGREGATE FUNCTION trimmed_mean(elements DOUBLE[], trim_size INTEGER)
RETURNS DOUBLE
BEGIN
   -- determine number of elements
   -- ensure that number of elements is greater than 2 * trim_size else return error
   -- order elements in ASC order
   -- chop off smallest trim_size elements and largest trim_size elements
   -- calculate arithmetic average of the remaining elements
   -- return arithmetic average
END

Can anyone help with how to write the function above correctly, for use with MySQL?

like image 631
Homunculus Reticulli Avatar asked Nov 05 '22 07:11

Homunculus Reticulli


2 Answers

That's no small task, you need to write it in c/c++...

  • http://dev.mysql.com/doc/refman/5.0/en/adding-udf.html


An option within MySQL itself, is to write a view or scalar function that aggregates the data how you want to, but from a specific table. This obviously restricts the function to a single source table, which may not be ideal.

A way around this could be to have a table dedicated to this function...

  • start a transaction
  • clear the table
  • insert your sample data
  • query the view/function

(Or something similar)

This precludes GROUP BY variations, unless you use dynamic sql or pass parameters to your function for specific grouping patterns.

It's all less than ideal, sorry.

like image 59
MatBailie Avatar answered Nov 10 '22 03:11

MatBailie


Have a look at this example (for MySQL) -

Create test table:

CREATE TABLE test_table (
  id INT(11) NOT NULL AUTO_INCREMENT,
  value INT(11) DEFAULT NULL,
  PRIMARY KEY (id)
);

INSERT INTO test_table(value) VALUES 
  (10), (2), (3), (5), (4), (7), (1), (9), (3), (5), (9);

Let's calculate avg value (edited variant):

SET @trim_size = 3;

SELECT AVG(value) avg FROM (
  SELECT value, @pos:=@pos + 1 pos FROM (SELECT * FROM test_table ORDER BY value) t1, (SELECT @pos:=0) t2
  ) t
WHERE pos > @trim_size AND pos <= @pos - @trim_size;

+--------+
| avg    |
+--------+
| 4.8000 |
+--------+
like image 35
Devart Avatar answered Nov 10 '22 04:11

Devart