Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL/Hive count distinct column

Tags:

sql

hive

How do I do this in Hive?

    columnA       columnB    columnC
     100.10      50.60       30
     100.10      50.60       30
     100.10      50.60       20
     100.10      70.80       40

Output should be:

  columnA   columnB    No_of_distinct_colC
  100.10    50.60       2
  100.10    70.80       1

Query that I think is correct:

SELECT columnA,columnB,COUNT(distinct column C)
from table_name
group by columnA,columnB

Is this correct? SQL is fine too.

UPDATE: How do I find the standard deviation of columnC? Need this asap.

like image 314
user2441441 Avatar asked Aug 06 '13 23:08

user2441441


People also ask

How do I get unique values of a column in Hive?

DISTINCT keyword is used in SELECT statement in HIVE to fetch only unique rows. The row does not mean entire row in the table but it means "row" as per column listed in the SELECT statement. If the SELECT has 3 columns listed then SELECT DISTINCT will fetch unique row for those 3 column values only.

How do I count distinct columns?

The COUNT DISTINCT function returns the number of unique values in the column or expression, as the following example shows. SELECT COUNT (DISTINCT item_num) FROM items; If the COUNT DISTINCT function encounters NULL values, it ignores them unless every value in the specified column is NULL.

Does distinct work in Hive?

DISTINCT specifies removal of duplicate rows from the result set. Note, Hive supports SELECT DISTINCT * starting in release 1.1. 0 (HIVE-9194). ALL and DISTINCT can also be used in a UNION clause – see Union Syntax for more information.

How do you count in Hive?

Simple Examples. In order to count the number of rows in a table: SELECT COUNT(*) FROM table2; Note that for versions of Hive which don't include HIVE-287, you'll need to use COUNT(1) in place of COUNT(*).


1 Answers

Yes, it is almost correct. But you have one simple mistake. Your column name is wrong inside COUNT.

SELECT columnA,columnB,COUNT(DISTINCT columnC) No_of_distinct_colC
from table_name
group by columnA,columnB
like image 168
AgentSQL Avatar answered Oct 02 '22 18:10

AgentSQL