Let's say I have a database column 'grade' like this:
|grade| | 1| | 2| | 1| | 3| | 4| | 5|
Is there a non-trivial way in SQL to generate a histogram like this?
|2,1,1,1,1,0|
where 2 means the grade 1 occurs twice, the 1s mean grades {2..5} occur once and 0 means grade 6 does not occur at all.
I don't mind if the histogram is one row per count.
If that matters, the database is SQL Server accessed by a perl CGI through unixODBC/FreeTDS.
EDIT: Thanks for your quick replies! It is okay if non-existing values (like grade 6 in the example above) do not occur as long as I can make out which histogram value belongs to which grade.
A histogram is a special type of column statistic that sorts values into buckets – as you might sort coins into buckets. Generating a histogram is a great way to understand the distribution of data.
A histogram is a special type of column statistic that provides more detailed information about the data distribution in a table column. A histogram sorts values into "buckets," as you might sort coins into buckets. Based on the NDV and the distribution of the data, the database chooses the type of histogram to create.
A histogram is an approximate representation of the distribution of numerical data. In other words, histograms show the number of data points that fall within a specified range of values (typically called “bins” or “buckets”).
SELECT COUNT(grade) FROM table GROUP BY grade ORDER BY grade
Haven't verified it, but it should work.It will not, however, show count for 6s grade, since it's not present in the table at all...
If there are a lot of data points, you can also group ranges together like this:
SELECT FLOOR(grade/5.00)*5 As Grade, COUNT(*) AS [Grade Count] FROM TableName GROUP BY FLOOR(Grade/5.00)*5 ORDER BY 1
Additionally, if you wanted to label the full range, you can get the floor and ceiling ahead of time with a CTE.
With GradeRanges As ( SELECT FLOOR(Score/5.00)*5 As GradeFloor, FLOOR(Score/5.00)*5 + 4 As GradeCeiling FROM TableName ) SELECT GradeFloor, CONCAT(GradeFloor, ' to ', GradeCeiling) AS GradeRange, COUNT(*) AS [Grade Count] FROM GradeRanges GROUP BY GradeFloor, CONCAT(GradeFloor, ' to ', GradeCeiling) ORDER BY GradeFloor
Note: In some SQL engines, you can GROUP BY
an Ordinal Column Index, but with MS SQL, if you want it in the SELECT
statement, you're going to need to group by it also, hence copying the Range into the Group Expression as well.
Option 2: You could use case statements to selectively count values into arbitrary bins and then unpivot them to get a row by row count of included values
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With