Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hive DISTINCT() for all columns?

Tags:

sql

hive

I wanted clarification as to how DISTINCT works, using a toy example below. Suppose you had a table like so, with 2 columns and only 2 rows of data:

SELECT * 
FROM table1;

colA   colB
A      B
A      C

Suppose I ran a SELECT DISTINCT query:

SELECT DISTINCT colA, colB 
FROM table1;

Which of the following results would be returned for the query above?

Outcome possibility 1:

A      B
A      C

The thinking for this possibility is that while the values are not distinct on colA, the entire returned row is unique, or distinct, when both columns are considered. I'm uncertain because of the effect of the comma between colA and colB in SELECT DISTINCT colA, colB and whether it serves to limit the DISTINCT to colA.

Outcome possibility 2:

A      B

Outcome possibility 3:

A      C
like image 434
user2205916 Avatar asked Mar 14 '17 18:03

user2205916


1 Answers

Perhaps this will help. This query:

SELECT DISTINCT colA, colB 
FROM table1;

is functionally equivalent to:

SELECT colA, colB 
FROM table1
GROUP BY colA, colB;

It is going to return all pairs of colA/colB that appear in the data.

like image 166
Gordon Linoff Avatar answered Oct 02 '22 23:10

Gordon Linoff