I have a table in SQL that looks like this: <pre class="prettyprint"><code>user_id | data1 0 | 6 0 | 6 0 | 6 0 | 1 0 | 1 0 | 2 1 | 5 1 | 5 1 | 3 1 | 3 1 | 3 1 | 7 </code></pre> I want to write a query that returns two columns: a column for the user id, and a column for what the most frequently occurring value per id is. In my example, for user_id 0, the most frequent value is 6, and for user_id 1, the most frequent value is 3. I would want it to look like below: <pre class="prettyprint"><code>user_id | most_frequent_value 0 | 6 1 | 3 </code></pre> I am using the query below to get the most frequent value, but it runs against the whole table and returns the most common value for the whole table instead of for each id. What would I need to add to my query to get it to return the most frequent value for each id? I am thinking I need to use a subquery, but am unsure of how to structure it. <pre class="prettyprint"><code>SELECT user_id, data1 AS most_frequent_value FROM my_table GROUP BY user_id, data1 ORDER BY COUNT(*) DESC LIMIT 1 </code></pre>

If you use proper "order by" then <code>distinct on (user_id)</code> make the same work because it takes 1.line from data partitioned by "user_id". <code>DISTINCT ON</code> is specialty of PostgreSQL. <pre class="prettyprint"><code>select distinct on (user_id) user_id, most_frequent_value from ( SELECT user_id, data1 AS most_frequent_value, count(*) as _count FROM my_table GROUP BY user_id, data1) a ORDER BY user_id, _count DESC </code></pre>

How to select most frequent value in a column per each id group?

Tags:

sql

select

postgresql

subquery

I have a table in SQL that looks like this:

user_id | data1
0       | 6
0       | 6
0       | 6
0       | 1
0       | 1
0       | 2
1       | 5
1       | 5
1       | 3
1       | 3
1       | 3
1       | 7

I want to write a query that returns two columns: a column for the user id, and a column for what the most frequently occurring value per id is. In my example, for user_id 0, the most frequent value is 6, and for user_id 1, the most frequent value is 3. I would want it to look like below:

user_id | most_frequent_value
0       | 6
1       | 3

I am using the query below to get the most frequent value, but it runs against the whole table and returns the most common value for the whole table instead of for each id. What would I need to add to my query to get it to return the most frequent value for each id? I am thinking I need to use a subquery, but am unsure of how to structure it.

SELECT user_id, data1 AS most_frequent_value
FROM my_table
GROUP BY user_id, data1
ORDER BY COUNT(*) DESC LIMIT 1

472

asked Dec 14 '16 15:12

cjh193

2 Answers

You can use a window function to rank the userids based on their count of data1.

WITH cte AS (
SELECT 
    user_id 
  , data1
  , ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY COUNT(data1) DESC) rn
FROM dbo.YourTable
GROUP BY
  user_id,
  data1)

SELECT
    user_id,
    data1
FROM cte WHERE rn = 1

182

answered Oct 04 '22 20:10

SQLChao

If you use proper "order by" then distinct on (user_id) make the same work because it takes 1.line from data partitioned by "user_id". DISTINCT ON is specialty of PostgreSQL.

select distinct on (user_id) user_id, most_frequent_value from (
SELECT user_id, data1 AS most_frequent_value, count(*) as _count
FROM my_table
GROUP BY user_id, data1) a
ORDER BY user_id, _count DESC

answered Oct 04 '22 20:10

JosMac

Related questions
                            
                                How to reset identity seed in Sql Azure
                            
                                error : subquery must return only one column
                            
                                ssis Package validation error ole db source failed
                            
                                kibana for sql database
                            
                                How to fix: Embedded H2 Database "NonTransientError: Unable to read the page at position" error?
                            
                                Create a trigger that updates a column on one table when a column in another table is updated
                            
                                Large SQL transaction: runs out of memory on PostgreSQL, yet works on SQL Server
                            
                                IMG dir can't be stored in db but viewed from the same variables used in query
                            
                                Prevent mutually recursive execution of triggers?
                            
                                Sharing a Java synchronized block across a cluster, or using a global lock?
                            
                                Can I use wildcards in "IN" MySQL statement?
                            
                                SQL: How to fill empty cells with previous row value?
                            
                                Entity Framework: How to properly handle exceptions that occur due to SQL constraints
                            
                                Pros/Cons of storing serialized hash vs. key/value database object in ActiveRecord?
                            
                                PostgreSQL: how to combine multiple rows?
                            
                                Converting ntext to nvcharmax(max) - Getting around size limitation
                            
                                SQL Duplicate column name error
                            
                                mysql circular dependency in foreign key constraints
                            
                                Convert Access TRANSFORM/PIVOT query to SQL Server
                            
                                SQL Server: Best way to concatenate multiple columns?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With