Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Eliminating duplicate values based on only one column of the table

My query:

SELECT sites.siteName, sites.siteIP, history.date FROM sites INNER JOIN      history ON sites.siteName = history.siteName ORDER BY siteName,date 

First part of the output:

enter image description here

How can I remove the duplicates in siteName column? I want to leave only the updated one based on date column.

In the example output above, I need the rows 1, 3, 6, 10

like image 501
Ned Avatar asked Jul 06 '13 23:07

Ned


People also ask

How do you eliminate duplicate values based on only one column of the table in SQL?

Introduction to SQL DISTINCT operator Note that the DISTINCT only removes the duplicate rows from the result set. It doesn't delete duplicate rows in the table. If you want to select two columns and remove duplicates in one column, you should use the GROUP BY clause instead.

How do I remove duplicates from a table in one column?

Select the entire dataset, along with the column headers. From the Data tab, under the Data Tools group select the Remove Duplicates button.

How do I remove duplicates based on criteria?

In Excel, there are several ways to filter for unique values—or remove duplicate values: To filter for unique values, click Data > Sort & Filter > Advanced. To remove duplicate values, click Data > Data Tools > Remove Duplicates.


2 Answers

This is where the window function row_number() comes in handy:

SELECT s.siteName, s.siteIP, h.date FROM sites s INNER JOIN      (select h.*, row_number() over (partition by siteName order by date desc) as seqnum       from history h      ) h     ON s.siteName = h.siteName and seqnum = 1 ORDER BY s.siteName, h.date 
like image 99
Gordon Linoff Avatar answered Oct 23 '22 04:10

Gordon Linoff


From your example it seems reasonable to assume that the siteIP column is determined by the siteName column (that is, each site has only one siteIP). If this is indeed the case, then there is a simple solution using group by:

select   sites.siteName,   sites.siteIP,   max(history.date) from sites inner join history on   sites.siteName=history.siteName group by   sites.siteName,   sites.siteIP order by   sites.siteName; 

However, if my assumption is not correct (that is, it is possible for a site to have multiple siteIP), then it is not clear from you question which siteIP you want the query to return in the second column. If just any siteIP, then the following query will do:

select   sites.siteName,   min(sites.siteIP),   max(history.date) from sites inner join history on   sites.siteName=history.siteName group by   sites.siteName order by   sites.siteName; 
like image 25
Mikhail Makarov Avatar answered Oct 23 '22 04:10

Mikhail Makarov