I am running Hive 071 I have a table, with mulitple rows, with the same column value e.g.
| x | y |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 2 | 2 |
| 3 | 2 |
| 3 | 1 |
I want to have the x column unique, and remove rows that have the same x val e.g.
| x | y |
| 1 | 2 |
| 2 | 2 |
| 3 | 2 |
or
| x | y |
| 1 | 4 |
| 2 | 2 |
| 3 | 1 |
are both good as distinct works only on the whole rs in hive, I couldn't find a way to do it
help please Tx
Some options:
1) This will give you the max value of y for each value of x
select x, max(y) from table1 group by x
Equally you could use avg() or min()
2) OR, you could collect all the values of y in a list:
select x, collect_set(y) from table1 group by x
This will give you:
x|y
1|2,3,4
2|2
3|1,2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With