Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting data for histogram plot

Is there a way to specify bin sizes in MySQL? Right now, I am trying the following SQL query:

select total, count(total) from faults GROUP BY total; 

The data that is being generated is good enough but there are just too many rows. What I need is a way to group the data into predefined bins. I can do this from a scripting language, but is there a way to do it directly in SQL?

Example:

+-------+--------------+ | total | count(total) | +-------+--------------+ |    30 |            1 |  |    31 |            2 |  |    33 |            1 |  |    34 |            3 |  |    35 |            2 |  |    36 |            6 |  |    37 |            3 |  |    38 |            2 |  |    41 |            1 |  |    42 |            5 |  |    43 |            1 |  |    44 |            7 |  |    45 |            4 |  |    46 |            3 |  |    47 |            2 |  |    49 |            3 |  |    50 |            2 |  |    51 |            3 |  |    52 |            4 |  |    53 |            2 |  |    54 |            1 |  |    55 |            3 |  |    56 |            4 |  |    57 |            4 |  |    58 |            2 |  |    59 |            2 |  |    60 |            4 |  |    61 |            1 |  |    63 |            2 |  |    64 |            5 |  |    65 |            2 |  |    66 |            3 |  |    67 |            5 |  |    68 |            5 |  ------------------------ 

What I am looking for:

+------------+---------------+ | total      | count(total)  | +------------+---------------+ |    30 - 40 |            23 |  |    40 - 50 |            15 |  |    50 - 60 |            51 |  |    60 - 70 |            45 |  ------------------------------ 

I guess this cannot be achieved in a straight forward manner but a reference to any related stored procedure would be fine as well.

like image 636
Legend Avatar asked Nov 19 '09 17:11

Legend


People also ask

How do you find the data from a histogram?

To read a histogram, start by looking at the horizontal axis, called the x-axis, to see how the data is grouped. Then, look at the vertical axis, called the y-axis, to see how frequently the data occurs.

What data is best for a histogram graph?

Use histograms when you have continuous measurements and want to understand the distribution of values and look for outliers. These graphs take your continuous measurements and place them into ranges of values known as bins.

What kind of data is used for histogram?

The histogram is a popular graphing tool. It is used to summarize discrete or continuous data that are measured on an interval scale.

How do you plot a value on a histogram?

To create a histogram the first step is to create bin of the ranges, then distribute the whole range of the values into a series of intervals, and count the values which fall into each of the intervals. Bins are clearly identified as consecutive, non-overlapping intervals of variables. The matplotlib. pyplot.


1 Answers

This is a post about a super quick-and-dirty way to create a histogram in MySQL for numeric values.

There are multiple other ways to create histograms that are better and more flexible, using CASE statements and other types of complex logic. This method wins me over time and time again since it's just so easy to modify for each use case, and so short and concise. This is how you do it:

SELECT ROUND(numeric_value, -2)    AS bucket,        COUNT(*)                    AS COUNT,        RPAD('', LN(COUNT(*)), '*') AS bar FROM   my_table GROUP  BY bucket; 

Just change numeric_value to whatever your column is, change the rounding increment, and that's it. I've made the bars to be in logarithmic scale, so that they don't grow too much when you have large values.

numeric_value should be offset in the ROUNDing operation, based on the rounding increment, in order to ensure the first bucket contains as many elements as the following buckets.

e.g. with ROUND(numeric_value,-1), numeric_value in range [0,4] (5 elements) will be placed in first bucket, while [5,14] (10 elements) in second, [15,24] in third, unless numeric_value is offset appropriately via ROUND(numeric_value - 5, -1).

This is an example of such query on some random data that looks pretty sweet. Good enough for a quick evaluation of the data.

+--------+----------+-----------------+ | bucket | count    | bar             | +--------+----------+-----------------+ |   -500 |        1 |                 | |   -400 |        2 | *               | |   -300 |        2 | *               | |   -200 |        9 | **              | |   -100 |       52 | ****            | |      0 |  5310766 | *************** | |    100 |    20779 | **********      | |    200 |     1865 | ********        | |    300 |      527 | ******          | |    400 |      170 | *****           | |    500 |       79 | ****            | |    600 |       63 | ****            | |    700 |       35 | ****            | |    800 |       14 | ***             | |    900 |       15 | ***             | |   1000 |        6 | **              | |   1100 |        7 | **              | |   1200 |        8 | **              | |   1300 |        5 | **              | |   1400 |        2 | *               | |   1500 |        4 | *               | +--------+----------+-----------------+ 

Some notes: Ranges that have no match will not appear in the count - you will not have a zero in the count column. Also, I'm using the ROUND function here. You can just as easily replace it with TRUNCATE if you feel it makes more sense to you.

I found it here http://blog.shlomoid.com/2011/08/how-to-quickly-create-histogram-in.html

like image 131
Jaro Avatar answered Sep 21 '22 14:09

Jaro