Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mysql create freqency distribution

I have a simple table BIRDCOUNT below, showing how many birds were counted on any given day:

+----------+
| NUMBIRDS |
+----------+
| 123      |
| 573      |
| 3        |
| 234      |
+----------+

I would like to create a frequency distribution graph, showing how many times a number of birds were counted. So I need MySQL to create something like:

+------------+-------------+
| BIRD_COUNT | TIMES_SEEN  |
+------------+-------------+
| 0-99       | 17          |
| 100-299    | 23          |
| 200-399    | 12          |
| 300-499    | 122         |
| 400-599    | 3           |
+------------+-------------+

If the bird count ranges were fixed this would be easy. However, I never know the min/max of how many birds were seen. So I need a select statement that:

  1. Creates an output similar to above, always creating 10 ranges of counts.
  2. (more advanced) Creates output similar to above, always creating N ranges of counts.

I don't know if #2 is possible in a single select but can anyone solve #1?

like image 712
TSG Avatar asked Feb 24 '13 19:02

TSG


People also ask

How do you create a frequency column?

To create a frequency column for categorical variable in an R data frame, we can use the transform function by defining the length of categorical variable using ave function. The output will have the duplicated frequencies as one value in the categorical column is likely to be repeated.


2 Answers

SELECT
    FLOOR( birds.bird_count / stat.diff ) * stat.diff as range_start, 
    (FLOOR( birds.bird_count / stat.diff ) +1) * stat.diff -1 as range_end, 
    count( birds.bird_count ) as times_seen
FROM birds_table birds, 
    (SELECT 
        ROUND((MAX( bird_count ) - MIN( bird_count ))/10) AS diff
    FROM birds_table
    ) AS stat
GROUP BY FLOOR( birds.bird_count / stat.diff )

Here You have answer for both of Your questions ;] with difference that start and end of range are in separate columns instead of concatenated but if You need it in one column I guess You can do it from here. To change number of ranges just edit number 10 You can find in sub-query.

like image 136
Gustek Avatar answered Oct 08 '22 13:10

Gustek


When creating something like this, GROUP BY, is your friend. The basic idea is to put each value into a bucket, and then count the number of elements in each bucket. To create a bucket, you define a function that takes the value and compute a unique value for the bucket.

Something like this:

SELECT
  @low := TRUNCATE(bird_count/100, 0) * 100 as Low,
  TRUNCATE(@low + 99, 0) as High,
  COUNT(*) AS Count
FROM birds_seen
GROUP BY Low;

In this case, you define a function that take the bird count, and compute the lower range of the bucket. You then group all the values on the lower range, which will place, for example, 123 and 145 into the bucket labelled "100", and 234 and 246 into the bucket labelled "200".

Now, each value is placed in a bucket, and you can group the values by the bucket label, and count the number of elements in each bucket.

like image 24
Mats Kindahl Avatar answered Oct 08 '22 14:10

Mats Kindahl