What I have is basically a problem which is easily solved with multiple tables, but I have only a single table to do it. Consider the following database table <pre class="prettyprint"><code>UserID UserName EmailAddress Source 3K3S9 Ben ben@myisp.com user SF13F Harry lharry_x@hotbail.com 3rd_party SF13F Harry reside@domain.com user 76DSA Lisa cake@insider.com user OL39F Nick stick@whatever.com 3rd_party 8F66S Stan myman@lol.com user </code></pre> I need to select all fields, but only who each user once along with one of their email addresses (the "biggest" one as determined by the MAX() function). This is the result I am after ... <pre class="prettyprint"><code>UserID UserName EmailAddress Source 3K3S9 Ben ben@myisp.com user SF13F Harry lharry_x@hotbail.com 3rd_party 76DSA Lisa cake@insider.com user OL39F Nick stick@whatever.com 3rd_party 8F66S Stan myman@lol.com user </code></pre> As you can see, "Harry" is only shown once with his "highest" email address the correcponding "source" Currently what is happening is that we are grouping on the UserID, UserName, and using MAX() for the EmailAddress and Source, but the max of those two fields dont always match up, they need to be from the same record. I have tried another process by joining the table with itself, but I have only managed to get the correct email address but not the corresponding "source" for that address. Any help would be appreciated as I have spent way too long trying to solve this already :)

If you're on SQL Server 2005 or higher, <pre class="prettyprint"><code>SELECT UserID, UserName, EmailAddress, Source FROM (SELECT UserID, UserName, EmailAddress, Source, ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY EmailAddress DESC) AS RowNumber FROM MyTable) AS a WHERE a.RowNumber = 1 </code></pre> Of course there are ways to do the same task without the (SQL-Standard) ranking functions such as <code>ROW_NUMBER</code>, which SQL Server implemented only since 2005 -- including nested dependent queries and self left joins with an <code>ON</code> including a '>' and a <code>WHERE ... IS NULL</code> trick -- but the ranking functions make for code that's readable and (in theory) well optimizable by the SQL Server Engine. Edit: this article is a nice tutorial on ranking, but it uses <code>RANK</code> in the examples instead of <code>ROW_NUMBER</code> (or the other ranking function, <code>DENSE_RANK</code>) -- the distinction matters when there are "ties" among grouped rows in the same partition according to the ordering criteria. this post does a good job explaining the difference.

SQL - SELECT MAX() and accompanying field

Tags:

sql

database

join

What I have is basically a problem which is easily solved with multiple tables, but I have only a single table to do it.

Consider the following database table

UserID UserName EmailAddress         Source
3K3S9  Ben      [email protected]        user
SF13F  Harry    [email protected] 3rd_party
SF13F  Harry    [email protected]    user
76DSA  Lisa     [email protected]     user
OL39F  Nick     [email protected]   3rd_party
8F66S  Stan     [email protected]        user

I need to select all fields, but only who each user once along with one of their email addresses (the "biggest" one as determined by the MAX() function). This is the result I am after ...

UserID UserName EmailAddress         Source
3K3S9  Ben      [email protected]        user
SF13F  Harry    [email protected] 3rd_party
76DSA  Lisa     [email protected]     user
OL39F  Nick     [email protected]   3rd_party
8F66S  Stan     [email protected]        user

As you can see, "Harry" is only shown once with his "highest" email address the correcponding "source"

Currently what is happening is that we are grouping on the UserID, UserName, and using MAX() for the EmailAddress and Source, but the max of those two fields dont always match up, they need to be from the same record.

I have tried another process by joining the table with itself, but I have only managed to get the correct email address but not the corresponding "source" for that address.

Any help would be appreciated as I have spent way too long trying to solve this already :)

731

asked Jun 18 '09 23:06

Nippysaurus

2 Answers

If you're on SQL Server 2005 or higher,

SELECT  UserID, UserName, EmailAddress, Source
FROM    (SELECT UserID, UserName, EmailAddress, Source,
                ROW_NUMBER() OVER (PARTITION BY UserID
                                   ORDER BY EmailAddress DESC) 
                    AS RowNumber
         FROM   MyTable) AS a
WHERE   a.RowNumber = 1

Of course there are ways to do the same task without the (SQL-Standard) ranking functions such as ROW_NUMBER, which SQL Server implemented only since 2005 -- including nested dependent queries and self left joins with an ON including a '>' and a WHERE ... IS NULL trick -- but the ranking functions make for code that's readable and (in theory) well optimizable by the SQL Server Engine.

Edit: this article is a nice tutorial on ranking, but it uses RANK in the examples instead of ROW_NUMBER (or the other ranking function, DENSE_RANK) -- the distinction matters when there are "ties" among grouped rows in the same partition according to the ordering criteria. this post does a good job explaining the difference.

166

answered Sep 28 '22 05:09

Alex Martelli

select distinct * from table t1
where EmailAddress = 
(select max(EmailAddress) from table t2
where t1.userId = t2.userId)

answered Sep 28 '22 05:09

tekBlues

Related questions
                            
                                Performing join operation on 3 tables using knex
                            
                                Using an IIF statement in a where clause
                            
                                How to write an upsert trigger in PostgreSQL?
                            
                                Encoding byte data and storing as TEXT vs storing in BYTEA in PostgreSQL
                            
                                Subtracting columns in postgresql table
                            
                                Pandas to_sql() to update unique values in DB?
                            
                                MySQL - Select where first character is lowercase or uppercase
                            
                                How to group by X minute increments in Presto SQL?
                            
                                How can I add a new value to an ENUM in Postgres without locking the table?
                            
                                How to select specific columns in typeorm querybuilder
                            
                                C# SQL Restore database to default data location
                            
                                Operator does not exist: interval > integer
                            
                                Best way to create a SQL Server rollback script?
                            
                                Insufficient privileges when creating a trigger for a table in another schema
                            
                                SQL Help: Counting Rows in a Single Query With a Nested SELECT
                            
                                What's your favored method for debugging MS SQL stored procedures?
                            
                                Best way to find free space in sql server databases?
                            
                                Simulating MySql's password() encryption using .NET or MS SQL
                            
                                What is a SQL statement to select an item that has several attributes in an item/attribute list?
                            
                                DB2 SQL code to extract stored procedures

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With