I have a table in the following format:
IDX IDY Time Text
idx1 idy1 t1 text1
idx1 idy2 t2 text2
idx1 idy2 t3 text3
idx1 idy1 t4 text4
idx2 idy3 t5 text5
idx2 idy3 t6 text6
idx2 idy1 t7 text7
idx2 idy3 t8 text8
What I'd like to see is something like this:
idx1 text1
idx1 text2, text3
idx1 text4
idx2 text5, text6
idx2 text7
idx2 text8
So in the final phase, I can get to:
text1
text2, text3
text4
==SEPERATOR==
text5, text6
text7
text8
How can I perform this in Hive or Presto? Thanks.
Hive
This is the base query, you can take it from here if you like
select IDX
,IDY
,min(time) as from_time
,max(time) as to_time
,concat_ws(',',collect_list (Text)) as text
from (select *
,row_number () over
(
partition by IDX
order by Time
) as rn
,row_number () over
(
partition by IDX,IDY
order by Time
) as rn_IDY
from mytable
) t
group by IDX,IDY
,rn - rn_IDY
order by IDX,from_time
+------+------+-----------+---------+-------------+
| idx | idy | from_time | to_time | text |
+------+------+-----------+---------+-------------+
| idx1 | idy1 | t1 | t1 | text1 |
| idx1 | idy2 | t2 | t3 | text2,text3 |
| idx1 | idy1 | t4 | t4 | text4 |
| idx2 | idy3 | t5 | t6 | text5,text6 |
| idx2 | idy1 | t7 | t7 | text7 |
| idx2 | idy3 | t8 | t8 | text8 |
+------+------+-----------+---------+-------------+
Presto
select array_join(array_agg (Text),',') as text
from (select *
,row_number () over
(
partition by IDX
order by Time
) as rn
,row_number () over
(
partition by IDX,IDY
order by Time
) as rn_IDY
from mytable
) t
group by IDX,IDY
,rn - rn_IDY
order by IDX,min(time)
;
+-------------+
| text |
+-------------+
| text1 |
| text2,text3 |
| text4 |
| text5,text6 |
| text7 |
| text8 |
+-------------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With