Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Left Join first match only

I have a query against a large number of big tables (rows and columns) with a number of joins, however one of tables has some duplicate rows of data causing issues for my query. Since this is a read only realtime feed from another department I can't fix that data, however I am trying to prevent issues in my query from it.

Given that, I need to add this crap data as a left join to my good query. The data set looks like:

IDNo    FirstName   LastName    ... ------------------------------------------- uqx     bob     smith abc     john        willis ABC     john        willis aBc     john        willis WTF     jeff        bridges sss     bill        doe ere     sally       abby wtf     jeff        bridges ... 

(about 2 dozen columns, and 100K rows)

My first instinct was to perform a distinct gave me about 80K rows:

SELECT DISTINCT P.IDNo FROM people P 

But when I try the following, I get all the rows back:

SELECT DISTINCT P.* FROM people P 

OR

SELECT      DISTINCT(P.IDNo) AS IDNoUnq      ,P.FirstName     ,P.LastName     ...etc.     FROM people P 

I then thought I would do a FIRST() aggregate function on all the columns, however that feels wrong too. Syntactically am I doing something wrong here?

Update: Just wanted to note: These records are duplicates based on a non-key / non-indexed field of ID listed above. The ID is a text field which although has the same value, it is a different case than the other data causing the issue.

like image 513
Dave Avatar asked Nov 06 '13 23:11

Dave


People also ask

Is Left join the default join?

Bookmark this question. Show activity on this post. LEFT JOIN is not equal to JOIN . The default JOIN is INNER .

Is Left join one to many?

SQL LEFT JOIN examples Each location belongs to one and only one country while each country can have zero or more locations. The relationship between the countries and locations tables is one-to-many.


2 Answers

distinct is not a function. It always operates on all columns of the select list.

Your problem is a typical "greatest N per group" problem which can easily be solved using a window function:

select ... from (   select IDNo,          FirstName,          LastName,          ....,          row_number() over (partition by lower(idno) order by firstname) as rn    from people  ) t where rn = 1; 

Using the order by clause you can select which of the duplicates you want to pick.

The above can be used in a left join, see below:

select ... from x   left join (     select IDNo,            FirstName,            LastName,            ....,            row_number() over (partition by lower(idno) order by firstname) as rn      from people    ) p on p.idno = x.idno and p.rn = 1 where ... 
like image 182
a_horse_with_no_name Avatar answered Sep 16 '22 11:09

a_horse_with_no_name


Add an identity column (PeopleID) and then use a correlated subquery to return the first value for each value.

SELECT * FROM People p WHERE PeopleID = (     SELECT MIN(PeopleID)      FROM People      WHERE IDNo = p.IDNo ) 
like image 28
T8RB Avatar answered Sep 19 '22 11:09

T8RB