I have a simple table BIRDCOUNT below, showing how many birds were counted on any given day:
+----------+
| NUMBIRDS |
+----------+
| 123 |
| 573 |
| 3 |
| 234 |
+----------+
I would like to create a frequency distribution graph, showing how many times a number of birds were counted. So I need MySQL to create something like:
+------------+-------------+
| BIRD_COUNT | TIMES_SEEN |
+------------+-------------+
| 0-99 | 17 |
| 100-299 | 23 |
| 200-399 | 12 |
| 300-499 | 122 |
| 400-599 | 3 |
+------------+-------------+
If the bird count ranges were fixed this would be easy. However, I never know the min/max of how many birds were seen. So I need a select statement that:
I don't know if #2 is possible in a single select but can anyone solve #1?
To create a frequency column for categorical variable in an R data frame, we can use the transform function by defining the length of categorical variable using ave function. The output will have the duplicated frequencies as one value in the categorical column is likely to be repeated.
SELECT
FLOOR( birds.bird_count / stat.diff ) * stat.diff as range_start,
(FLOOR( birds.bird_count / stat.diff ) +1) * stat.diff -1 as range_end,
count( birds.bird_count ) as times_seen
FROM birds_table birds,
(SELECT
ROUND((MAX( bird_count ) - MIN( bird_count ))/10) AS diff
FROM birds_table
) AS stat
GROUP BY FLOOR( birds.bird_count / stat.diff )
Here You have answer for both of Your questions ;] with difference that start and end of range are in separate columns instead of concatenated but if You need it in one column I guess You can do it from here. To change number of ranges just edit number 10 You can find in sub-query.
When creating something like this, GROUP BY, is your friend. The basic idea is to put each value into a bucket, and then count the number of elements in each bucket. To create a bucket, you define a function that takes the value and compute a unique value for the bucket.
Something like this:
SELECT
@low := TRUNCATE(bird_count/100, 0) * 100 as Low,
TRUNCATE(@low + 99, 0) as High,
COUNT(*) AS Count
FROM birds_seen
GROUP BY Low;
In this case, you define a function that take the bird count, and compute the lower range of the bucket. You then group all the values on the lower range, which will place, for example, 123 and 145 into the bucket labelled "100", and 234 and 246 into the bucket labelled "200".
Now, each value is placed in a bucket, and you can group the values by the bucket label, and count the number of elements in each bucket.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With