Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Syntax to count the number of UNIQUE matching values

I'm working with a dataset that looks like this:

    | Host      | Risk           | Name      |
    | 10.1.1.1  | Critical       | ValueA    |
    | 10.1.1.1  | Critical       | ValueA    |
    | 10.1.1.1  | Critical       | ValueA    |
    | 10.1.1.1  | Critical       | ValueA    |
    | 10.1.1.1  | Critical       | ValueA    |
    | 10.1.1.1  | Critical       | ValueA    |
    | 10.1.1.1  | High           | ValueB    |
    | 10.1.1.1  | High           | ValueB    |
    | 10.1.1.1  | High           | ValueB    |
    | 10.1.1.1  | Critical       | ValueC    |
    | 10.1.1.1  | Critical       | ValueC    |
    | 10.1.1.1  | Critical       | ValueC    |
    | 10.1.1.1  | Critical       | ValueC    |

I'm trying to figure out a query that will produce a summary that looks like this:

    | Host      | Critical     | High      |
    | 10.1.1.1  | 2            | 1         |

Critical has a "2" underneath because there are only 2 DISTINCT values of the Name field ("ValueA" and "ValueC"). High has a "1" underneath because there is only 1 distinct Name value ("ValueB"). The tricky part for me is that I'm not trying to count the number of rows, but just the distinct values that match. In case you're wondering, the data is duplicated because there are other columns that contain unique values, but they're irrelevant to this query.

The closest I was able to get on my own is as follows, but that only produced the "Critical" column, and I can't figure out how to add the logic to get the "High" column:

    select Host, COUNT(DISTINCT Name) as Critical
    from [table]
    WHERE Risk = 'Critical'
    group by 1;

Any advice would be much appreciated. I tried a "COUNTIF" function, but kept getting the error "Unrecognized function countif," which seemed odd since "COUNTIF" is listed in the BigQuery documentation (https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#countif). Also tried to make CASE work, but didn't make a lot of progress.

Thanks!

like image 472
Josh Rogan Avatar asked Jan 02 '23 21:01

Josh Rogan


1 Answers

Below is for BigQuery Standard SQL

#standardSQL
WITH `project.dataset.table` AS (
  SELECT '10.1.1.1' Host, 'Critical' Risk, 'ValueA' Name UNION ALL
  SELECT '10.1.1.1', 'Critical', 'ValueA' UNION ALL
  SELECT '10.1.1.1', 'Critical', 'ValueA' UNION ALL
  SELECT '10.1.1.1', 'Critical', 'ValueA' UNION ALL
  SELECT '10.1.1.1', 'Critical', 'ValueA' UNION ALL
  SELECT '10.1.1.1', 'Critical', 'ValueA' UNION ALL
  SELECT '10.1.1.1', 'High', 'ValueB' UNION ALL
  SELECT '10.1.1.1', 'High', 'ValueB' UNION ALL
  SELECT '10.1.1.1', 'High', 'ValueB' UNION ALL
  SELECT '10.1.1.1', 'Critical', 'ValueC' UNION ALL
  SELECT '10.1.1.1', 'Critical', 'ValueC' UNION ALL
  SELECT '10.1.1.1', 'Critical', 'ValueC' UNION ALL
  SELECT '10.1.1.1', 'Critical', 'ValueC' 
)
SELECT 
  Host, 
  COUNT(DISTINCT IF(Risk='Critical', Name, NULL)) Critical,
  COUNT(DISTINCT IF(Risk='High', Name, NULL)) High
FROM `project.dataset.table`
GROUP BY Host   

with result

Row Host        Critical    High     
1   10.1.1.1    2           1    
like image 130
Mikhail Berlyant Avatar answered Jan 05 '23 03:01

Mikhail Berlyant