How can I SUM distinct records in a Postgres database where there are duplicate records?

Tags:

postgresql

Imagine a table that looks like this:

table with duplicate data

The SQL to get this data was just SELECT * The first column is "row_id" the second is "id" - which is the order ID and the third is "total" - which is the revenue.

I'm not sure why there are duplicate rows in the database, but when I do a SUM(total), it's including the second entry in the database, even though the order ID is the same, which is causing my numbers to be larger than if I select distinct(id), total - export to excel and then sum the values manually.

So my question is - how can I SUM on just the distinct order IDs so that I get the same revenue as if I exported to excel every distinct order ID row?

Thanks in advance!

823

asked Apr 10 '16 00:04

Katie F

2 Answers

Easy - just divide by the count:

select id, sum(total) / count(id)
from orders
group by id

See live demo.

Also handles any level of duplication, eg triplicates etc.

138

answered Sep 24 '22 00:09

Bohemian

You can try something like this (with your example):

Table

create table test (
  row_id int,
  id int,
  total decimal(15,2)
);

insert into test values 
(6395, 1509, 112), (22986, 1509, 112), 
(1393, 3284, 40.37), (24360, 3284, 40.37);

Query

with distinct_records as (
  select distinct id, total from test
)

select a.id, b.actual_total, array_agg(a.row_id) as row_ids
from test a
inner join (select id, sum(total) as actual_total from distinct_records group by id) b
  on a.id = b.id
group by a.id, b.actual_total

Result

|   id | actual_total |    row_ids |
|------|--------------|------------|
| 1509 |          112 | 6395,22986 |
| 3284 |        40.37 | 1393,24360 |

Explanation

We do not know what the reasons is for orders and totals to appear more than one time with different row_id. So using a common table expression (CTE) using the with ... phrase, we get the distinct id and total.

Under the CTE, we use this distinct data to do totaling. We join ID in the original table with the aggregation over distinct values. Then we comma-separate row_ids so that the information looks cleaner.

SQLFiddle example

http://sqlfiddle.com/#!15/72639/3

answered Sep 22 '22 00:09

zedfoxus

Related questions
                            
                                Java Crosstab - preparedstatement query
                            
                                SQL order by highest value of two columns
                            
                                ActiveRecord not saving any attributes, saving default values
                            
                                Django query with order_by, distinct and limit on Postgresql
                            
                                Extract Month From Date Field
                            
                                pg Admin 4 - password for "postgres" user when trying to connect to PostgreSQL 13 server
                            
                                Running destroy_all on an array?
                            
                                Left-Outer Join in Postgres Not Returning Values for Null
                            
                                Postgres Timestamp
                            
                                Refer to a column by its number (index)
                            
                                How do I search within an JSON array of hashes by hash values?
                            
                                Postgresql is the server running locally and accepting connection on Unix domain
                            
                                Resque worker failing with PostgreSQL server
                            
                                Rails what's difference in unique index and validates_uniqueness_of
                            
                                Invalid input syntax for type interval
                            
                                Update table using result of another query
                            
                                Ignore columns in ActiveRecord
                            
                                Execute several .sql files in a single transaction using PostgreSQL and bash
                            
                                PostGIS extract coordinates from POLYGON
                            
                                PostgreSQL: UPDATE using aggregate function

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With