I have used multiple columns in Partition By statement in SQL but duplicate rows are returned back. I only want distinct rows being returned back.
This is what I have coded in Partition By:
SELECT DATE, STATUS, TITLE, ROW_NUMBER() OVER (PARTITION BY DATE, STATUS, TITLE ORDER BY QUANTITY ASC) AS Row_Num
FROM TABLE
This is the output I get currently: (Where there are duplicate rows being returned - Please Refer to Row 6 to 8)
This is the output I want to achieve: (no duplicate row being returned - Please Refer to Row 6 to 8)
Question: How can I place multiple columns in 1 Partition By and Ensure No Duplicate Row is Returned?
Appreciate if someone can provide me help on this, thanks a lot!!
Multi-column partitioning allows us to specify more than one column as a partition key. Currently multi-column partitioning is possible only for range and hash type. Range partitioning was introduced in PostgreSQL10 and hash partitioning was added in PostgreSQL 11.
The best way to delete duplicate rows by multiple columns is the simplest one: Add an UNIQUE index: ALTER IGNORE TABLE your_table ADD UNIQUE (field1,field2,field3); The IGNORE above makes sure that only the first found row is kept, the rest discarded.
Try this, It worked for me
SELECT * FROM (
SELECT
[Code],
[Name],
[CategoryCode],
[CreatedDate],
[ModifiedDate],
[CreatedBy],
[ModifiedBy],
[IsActive],
ROW_NUMBER() OVER(PARTITION BY [Code],[Name],[CategoryCode] ORDER BY ID DESC) rownumber
FROM MasterTable
) a
WHERE rownumber = 1
If your table columns contains duplicate data and If you directly apply row_ number() and create PARTITION on column, there is chance to have result in duplicated row and with row number value.
To remove duplicate row, you need one more INNER query in from clause which eliminates duplicate rows and then it will give output to it's foremost outer FROM clause where you can apply PARTITION and ROW_NUMBER ().
As like below example:
SELECT DATE, STATUS, TITLE, ROW_NUMBER() OVER (PARTITION BY DATE, STATUS, TITLE ORDER BY QUANTITY ASC) AS Row_Num
FROM (
SELECT DISTINCT <column names>...
) AS tbl
I'd create a cte and do an inner join. It's not efficient but it's convenient
with table as (
SELECT DATE, STATUS, TITLE, ROW_NUMBER()
OVER (PARTITION BY DATE, STATUS, TITLE ORDER BY QUANTITY ASC) AS Row_Num
FROM TABLE)
select *
from table t
join select(
max(Row_Num) as Row_Num
,DATE
,STATUS
,TITLE
from table
group by date, status, title) t2
on t2.Row_Num = t.Row_Num and t2
and t2.date = t.date
and t2.title = t.title
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With