Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Impala: all DISTINCT aggregate functions need to have the same set of parameters

Tags:

sql

hdfs

impala

I got the following error in my Impala query:

select 
   upload_key, 
   max(my_timestamp) as upload_time, 
   max(color_key) as max_color_fk, 
   count(distinct color_key) as color_count, 
   count(distinct id) as toy_count 
from upload_table 
group by upload_key

and got the error:

AnalysisException: all DISTINCT aggregate functions need to have the same set of parameters as count(DISTINCT color_key); deviating function: count(DISTINCT id)

I am not sure why I got this error. What I did was for each group (grouped by upload_key), I tried to compute how many distinct id as well as how many distinct color_key.
Does any one have any idea

like image 978
Edamame Avatar asked Jan 06 '23 11:01

Edamame


1 Answers

The error message indicates that DISTINCT is only allowed on one column [combination], but you try two, color_key& id. A workaround would be two Selects and then a join:

select
   t1.upload_key,
   t1.upload_time,
   t1.max_color_fk,
   t1.color_count,
   t2.toy_count
from
 (
   select 
      upload_key, 
      max(my_timestamp) as upload_time, 
      max(color_key) as max_color_fk, 
      count(distinct color_key) as color_count
   from upload_table 
   group by upload_key
 ) as t1
join
 (
   select 
      upload_key
      count(distinct id) as toy_count 
   from upload_table 
   group by upload_key
 ) as t2
on t1. upload_key = t2.upload_key
like image 162
dnoeth Avatar answered Jan 08 '23 10:01

dnoeth