Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generating a histogram from column values in a database

Let's say I have a database column 'grade' like this:

|grade| |    1| |    2| |    1| |    3| |    4| |    5| 

Is there a non-trivial way in SQL to generate a histogram like this?

|2,1,1,1,1,0| 

where 2 means the grade 1 occurs twice, the 1s mean grades {2..5} occur once and 0 means grade 6 does not occur at all.

I don't mind if the histogram is one row per count.

If that matters, the database is SQL Server accessed by a perl CGI through unixODBC/FreeTDS.

EDIT: Thanks for your quick replies! It is okay if non-existing values (like grade 6 in the example above) do not occur as long as I can make out which histogram value belongs to which grade.

like image 779
Thorsten79 Avatar asked Jan 27 '09 21:01

Thorsten79


People also ask

Can SQL create a histogram?

A histogram is a special type of column statistic that sorts values into buckets – as you might sort coins into buckets. Generating a histogram is a great way to understand the distribution of data.

What is a column histogram?

A histogram is a special type of column statistic that provides more detailed information about the data distribution in a table column. A histogram sorts values into "buckets," as you might sort coins into buckets. Based on the NDV and the distribution of the data, the database chooses the type of histogram to create.

What is SQL histogram?

A histogram is an approximate representation of the distribution of numerical data. In other words, histograms show the number of data points that fall within a specified range of values (typically called “bins” or “buckets”).


2 Answers

SELECT COUNT(grade) FROM table GROUP BY grade ORDER BY grade 

Haven't verified it, but it should work.It will not, however, show count for 6s grade, since it's not present in the table at all...

like image 125
Ilya Volodin Avatar answered Oct 08 '22 19:10

Ilya Volodin


If there are a lot of data points, you can also group ranges together like this:

SELECT FLOOR(grade/5.00)*5 As Grade,         COUNT(*) AS [Grade Count] FROM TableName GROUP BY FLOOR(Grade/5.00)*5 ORDER BY 1 

Additionally, if you wanted to label the full range, you can get the floor and ceiling ahead of time with a CTE.

With GradeRanges As (   SELECT FLOOR(Score/5.00)*5     As GradeFloor,           FLOOR(Score/5.00)*5 + 4 As GradeCeiling   FROM TableName ) SELECT GradeFloor,        CONCAT(GradeFloor, ' to ', GradeCeiling) AS GradeRange,        COUNT(*) AS [Grade Count] FROM GradeRanges GROUP BY GradeFloor, CONCAT(GradeFloor, ' to ', GradeCeiling) ORDER BY GradeFloor 

Note: In some SQL engines, you can GROUP BY an Ordinal Column Index, but with MS SQL, if you want it in the SELECT statement, you're going to need to group by it also, hence copying the Range into the Group Expression as well.

Option 2: You could use case statements to selectively count values into arbitrary bins and then unpivot them to get a row by row count of included values

like image 32
KyleMit Avatar answered Oct 08 '22 21:10

KyleMit