Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Distinct on specific column in Hive

I am running Hive 071 I have a table, with mulitple rows, with the same column value e.g.

| x | y |


| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 2 | 2 |
| 3 | 2 |
| 3 | 1 |

I want to have the x column unique, and remove rows that have the same x val e.g.

| x | y |


| 1 | 2 |
| 2 | 2 |
| 3 | 2 |

or

| x | y |


| 1 | 4 |
| 2 | 2 |
| 3 | 1 |

are both good as distinct works only on the whole rs in hive, I couldn't find a way to do it

help please Tx

like image 305
Tomer Avatar asked Sep 13 '11 12:09

Tomer


1 Answers

Some options:

1) This will give you the max value of y for each value of x

select x, max(y) from table1 group by x

Equally you could use avg() or min()

2) OR, you could collect all the values of y in a list:

select x, collect_set(y) from table1 group by x

This will give you:

x|y
1|2,3,4
2|2
3|1,2
like image 69
Matthew Rathbone Avatar answered Sep 27 '22 16:09

Matthew Rathbone