How do I filter the top 1% and lower 1% of data in each group in SQL

Tags:

I have a data set that includes PRICE, SUBTYPE, and others. I want to do some outlier removal before I use the dataset. I want to remove rows for things where the price is ridiculously high or low, in each SUBTYPE.

For each SUBTYPE look at the range of the PRICEs and remove or filter out rows. Keep rows that fall between: PRICErange * .01 |KEEP| PRICErange * .99

This was provided to me by a Martin Smith on stackoverflow, I edited this question, so lets start from here.

;WITH CTE       
AS (SELECT *,                   
ROW_NUMBER() OVER (PARTITION BY SUBTYPE ORDER BY PRICE) AS RN,                    
COUNT(*) OVER(PARTITION BY SUBTYPE) AS Cnt             
FROM    all_resale)    
SELECT *    
FROM   CTE    
WHERE (CASE WHEN Cnt > 1 THEN 100.0 * (RN -1)/(Cnt -1) END) BETWEEN 1 AND 99

I'm not sure this is what I need to do. I don't know how many rows will be removed off the ends.

201

asked Jun 14 '13 12:06

Brandon Smith

1 Answers

You don't specify exactly how you define the 1 percent and how ties should be handled.

One way is below

;WITH CTE
     AS (SELECT *,
                ROW_NUMBER() OVER (PARTITION BY SUBTYPE ORDER BY PRICE) AS RN,
                COUNT(*) OVER(PARTITION BY SUBTYPE) AS Cnt
         FROM    all_resale)
SELECT *
FROM   CTE
WHERE (CASE WHEN Cnt > 1 THEN 100.0 * (RN -1)/(Cnt -1) END) BETWEEN 1 AND 99

That assumes the highest price item is 100%, the lowest price one 0% and all others scaled evenly between taking no account of ties. If you need to take account of ties look into RANK rather than ROW_NUMBER

NB: If all of the subtypes have a relatively large amount of rows you could use NTILE(100) instead but it does not distribute between buckets well if the number of rows is small relative to number of buckets.

109

answered Sep 28 '22 16:09

Martin Smith

Related questions
                            
                                Getting counts/totals at each level of a hierarchical query using CONNECT BY
                            
                                Vertica: Data validation of duplicate/primary key
                            
                                Select second highest value per distinct foreign key
                            
                                Data grouping in SQL
                            
                                T-SQL ORDER BY according to a condition
                            
                                Parsing XML with unknown namespaces in Oracle SQL
                            
                                MySQL: how to get x number of results per grouping [duplicate]
                            
                                Database temporarily disconnected after a lots of transactions by pgbench
                            
                                Optimize postgresql query
                            
                                Calculate statistics about duration between timestamped data
                            
                                Excel aggregating function
                            
                                ranking one column on another column
                            
                                Duplicating values because of SQL string?
                            
                                How and When LINQ Queries are Translated and Evaluated?
                            
                                sql update based on column names
                            
                                Storing SQL credentials correctly
                            
                                Ruby ActiveRecord and sql tuple support
                            
                                Parent child mysql
                            
                                Oracle: Find the position of an error in dynamic SQL using SQL or PL/SQL
                            
                                What is better create new table or add columns in existing table

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I filter the top 1% and lower 1% of data in each group in SQL

Tags:

sql

sql-server-2008

Brandon Smith

People also ask

1 Answers

Martin Smith

Recent Activity

Donate For Us