What am I missing? This query is returning duplicate data over and over again. The count is correct for a complete total, but I am expecting one row, and yet I am getting the value repeated about 40 times. Any ideas? <pre class="prettyprint"><code>SELECT BrandId ,SUM(ICount) OVER (PARTITION BY BrandId ) FROM Table WHERE DateId = 20130618 </code></pre> I get this? <pre class="prettyprint"><code>BrandId ICount 2 421762 2 421762 2 421762 2 421762 2 421762 2 421762 2 421762 1 133346 1 133346 1 133346 1 133346 1 133346 1 133346 1 133346 </code></pre> What am I missing? I cant remove the partition by as the entire query is like this: <pre class="prettyprint"><code>SELECT BrandId ,SUM(ICount) OVER (PARTITION BY BrandId) ,TotalICount= SUM(ICount) OVER () ,SUM(ICount) OVER () / SUM(ICount) OVER (PARTITION BY BrandId) as Percentage FROM Table WHERE DateId = 20130618 </code></pre> Which returns this: <pre class="prettyprint"><code>BrandId (No column name) TotalICount Percentage 2 421762 32239892 76 2 421762 32239892 76 2 421762 32239892 76 2 421762 32239892 76 2 421762 32239892 76 2 421762 32239892 76 </code></pre> I would expect output something like this without having to use a distinct: <pre class="prettyprint"><code>BrandId (No column name) TotalICount Percentage 2 421762 32239892 76 9 1238442 32239892 26 10 1467473 32239892 21 </code></pre>

You could have used <code>DISTINCT</code> or just remove the <code>PARTITION BY</code> portions and use <code>GROUP BY</code>: <pre class="prettyprint"><code>SELECT BrandId ,SUM(ICount) ,TotalICount = SUM(ICount) OVER () ,Percentage = SUM(ICount) OVER ()*1.0 / SUM(ICount) FROM Table WHERE DateId = 20130618 GROUP BY BrandID </code></pre> Not sure why you are dividing the total by the count per BrandID, if that's a mistake and you want percent of total then reverse those bits above to: <pre class="prettyprint"><code>SELECT BrandId ,SUM(ICount) ,TotalICount = SUM(ICount) OVER () ,Percentage = SUM(ICount)*1.0 / SUM(ICount) OVER () FROM Table WHERE DateId = 20130618 GROUP BY BrandID </code></pre>

SUM OVER PARTITION BY

Tags:

sql

sql-server

tsql

What am I missing?

This query is returning duplicate data over and over again. The count is correct for a complete total, but I am expecting one row, and yet I am getting the value repeated about 40 times. Any ideas?

SELECT BrandId       ,SUM(ICount) OVER (PARTITION BY BrandId )    FROM Table  WHERE DateId  = 20130618

I get this?

BrandId ICount 2       421762 2       421762 2       421762 2       421762 2       421762 2       421762 2       421762 1       133346 1       133346 1       133346 1       133346 1       133346 1       133346 1       133346

What am I missing?

I cant remove the partition by as the entire query is like this:

SELECT BrandId        ,SUM(ICount) OVER (PARTITION BY BrandId)         ,TotalICount= SUM(ICount) OVER ()             ,SUM(ICount) OVER () / SUM(ICount) OVER (PARTITION BY BrandId)  as Percentage FROM Table  WHERE DateId  = 20130618

Which returns this:

BrandId (No column name)    TotalICount Percentage 2       421762              32239892    76 2       421762              32239892    76 2       421762              32239892    76 2       421762              32239892    76 2       421762              32239892    76 2       421762              32239892    76

I would expect output something like this without having to use a distinct:

BrandId (No column name)    TotalICount Percentage 2       421762              32239892    76 9       1238442             32239892    26 10      1467473             32239892    21

806

asked Jul 25 '13 20:07

nitefrog

2 Answers

In my opinion, I think it's important to explain the why behind the need for a GROUP BY in your SQL when summing with OVER() clause and why you are getting repeated lines of data when you are expecting one row per BrandID.

Take this example: You need to aggregate the total sale price of each order line, per specific order category, between two dates, but you also need to retain individual order data in your final results. A SUM() on the SalesPrice column would not allow you to get the correct totals because it would require a GROUP BY, therefore squashing the details because you wouldn't be able to keep the individual order lines in the select statement.

Many times we see a #temp table, @table variable, or CTE filled with the sum of our data and grouped up so we can join to it again later to get a column of the sums we need. This can add processing time and extra lines of code. Instead, use OVER(PARTITION BY ()) like this:

SELECT   OrderLine,    OrderDateTime,    SalePrice,    OrderCategory,   SUM(SalePrice) OVER(PARTITION BY OrderCategory) AS SaleTotalPerCategory FROM tblSales  WHERE OrderDateTime BETWEEN @StartDate AND @EndDate

Notice we are not grouping and we have individual order lines column selected. The PARTITION BY in the last column will return us a sales price total for each row of data in each category. What the last column essentially says is, we want the sum of the sale price (SUM(SalePrice)) over a partition of my results and by a specified category (OVER(PARTITION BY CategoryHere)).

If we remove the other columns from our select statement, and leave our final SUM() column, like this:

SELECT   SUM(SalePrice) OVER(PARTITION BY OrderCategory) AS SaleTotalPerCategory FROM tblSales  WHERE OrderDateTime BETWEEN @StartDate AND @EndDate

The results will still repeat this sum for each row in our original result set. The reason is this method does not require a GROUP BY. If you don't need to retain individual line data, then simply SUM() without the use of OVER() and group up your data appropriately. Again, if you need an additional column with specific totals, you can use the OVER(PARTITION BY ()) method described above without additional selects to join back to.

The above is purely for explaining WHY he is getting repeated lines of the same number and to help understand what this clause provides. This method can be used in many ways and I highly encourage further reading from the documentation here:

Over Clause

114

answered Sep 20 '22 03:09

E10

You could have used DISTINCT or just remove the PARTITION BY portions and use GROUP BY:

SELECT BrandId        ,SUM(ICount)        ,TotalICount = SUM(ICount) OVER ()            ,Percentage = SUM(ICount) OVER ()*1.0 / SUM(ICount)  FROM Table  WHERE DateId  = 20130618 GROUP BY BrandID

Not sure why you are dividing the total by the count per BrandID, if that's a mistake and you want percent of total then reverse those bits above to:

SELECT BrandId            ,SUM(ICount)            ,TotalICount = SUM(ICount) OVER ()                ,Percentage = SUM(ICount)*1.0 / SUM(ICount) OVER ()      FROM Table      WHERE DateId  = 20130618     GROUP BY BrandID

answered Sep 22 '22 03:09

Hart CO

Related questions
                            
                                [] brackets in sql statements
                            
                                Should I allow null values in a db schema?
                            
                                What permissions for PHP scripts/directories?
                            
                                How to make SQL Management Studio see new table?
                            
                                SQL: Advantages of an ENUM vs. a one-to-many relationship?
                            
                                SQL Select Upcoming Birthdays
                            
                                Is it bad to rely on foreign key cascading?
                            
                                SQL for sorting boolean column as true, null, false
                            
                                MongoDB and PostgreSQL thoughts
                            
                                linq to sql join on multiple columns using lambda
                            
                                Solution to "cannot perform a DML operation inside a query"?
                            
                                Check if a column contains text using SQL
                            
                                How to select date and time without the seconds in mysql?
                            
                                How to change only the year of a date datatype
                            
                                UPDATE statement with multiple joins in PostgreSQL
                            
                                how to convert date to a format `mm/dd/yyyy`
                            
                                Oracle Delete Rows Matching On Multiple Values
                            
                                How does one write a DELETE CASCADE for postgres?
                            
                                update a column by subtracting a value
                            
                                How do I create a step in my SQL Server Agent Job which will run my SSIS package?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With