Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to build "Star Rating" report in BigQuery (or sparklines, or color gradients)

Suppose I have the followng sample input:

WITH Ratings AS (
    (SELECT 'A' name, 2 score) UNION ALL
    (SELECT 'B' name, 0 score) UNION ALL
    (SELECT 'C' name, 5 score) UNION ALL
    (SELECT 'D' name, 1 score))

Where score is number between 0 and 5. How can I produce a report showing names and corresponding number of stars ?

like image 603
Mosha Pasumansky Avatar asked Feb 23 '18 16:02

Mosha Pasumansky


3 Answers

We can build star rating as a string using two Unicode characters:

★ - Unicode code point 9733 
☆ - Unicode code point 9734

We can use CODE_POINTS_TO_STRING function to build the stars, and REPEAT function to produce the right number of stars

Combined together the solution for sample input will be:

WITH Ratings AS (
(SELECT 'A' name, 2 score) UNION ALL
(SELECT 'B' name, 0 score) UNION ALL
(SELECT 'C' name, 5 score) UNION ALL
(SELECT 'D' name, 1 score))

SELECT 
  name, 
  CONCAT(
    REPEAT(CODE_POINTS_TO_STRING([9733]), score),
    REPEAT(CODE_POINTS_TO_STRING([9734]), 5-score)) score
FROM Ratings

It will produce the following result:

name    score
A       ★★☆☆☆
B       ☆☆☆☆☆
C       ★★★★★
D       ★☆☆☆☆
like image 88
Mosha Pasumansky Avatar answered Nov 23 '22 19:11

Mosha Pasumansky


My entry does a color gradient, because sparklines only look good with certain fonts - and that's not a font that the BigQuery web UI uses.

During a day, when is Stack Overflow the most active per tag:

#standardSQL
CREATE TEMP FUNCTION barchart(v ARRAY<FLOAT64>, mm STRUCT<min FLOAT64, max FLOAT64>) AS ((
    SELECT STRING_AGG(SUBSTR('🏿🏾🏽🏼🏻', 1+CAST(ROUND(y) AS INT64), 1), '') 
    FROM (SELECT IFNULL(SAFE_DIVIDE((e-mm.min),(mm.max-mm.min))*4, 0) y FROM UNNEST(v) e))); 
CREATE TEMP FUNCTION vbar(v ARRAY<FLOAT64>) AS ( 
  barchart(v, (SELECT AS STRUCT MIN(a), MAX(a) FROM UNNEST(v) a)) 
);


WITH top_tags AS (
 (SELECT x.value FROM (SELECT APPROX_TOP_COUNT(tag, 24) x FROM `bigquery-public-data.stackoverflow.posts_questions`, UNNEST(SPLIT(tags,'|')) tag WHERE EXTRACT(YEAR FROM creation_date)>=2016), UNNEST(x) x)
)

SELECT tag, vbar(ARRAY_AGG(1.0*hhh.count ORDER BY hhh.value)) gradient, SUM(hhh.count)  c
FROM (
  SELECT tag, APPROX_TOP_COUNT(EXTRACT(HOUR FROM creation_date), 24) h_h
  FROM `bigquery-public-data.stackoverflow.posts_questions`, UNNEST(SPLIT(tags,'|')) tag
  WHERE tag IN (SELECT * FROM top_tags) AND EXTRACT(YEAR FROM creation_date)>=2016
  GROUP BY 1
), UNNEST(h_h) hhh
GROUP BY tag
ORDER BY STRPOS(gradient, '🏼')



Row gradient                                                c       tag  
1   🏿🏿🏿🏿🏾🏽🏼🏼🏼🏻🏻🏻🏻🏼🏼🏼🏼🏽🏽🏽🏽🏾🏾🏿    317538  android  
2   🏿🏿🏿🏿🏾🏽🏼🏼🏼🏻🏻🏻🏻🏻🏻🏻🏼🏼🏽🏽🏽🏾🏾🏿    59445   asp.net  
3   🏿🏿🏿🏿🏾🏽🏼🏼🏼🏻🏻🏻🏼🏼🏼🏼🏽🏽🏽🏽🏾🏾🏾🏿    159134  ios  
4   🏿🏿🏿🏿🏾🏽🏼🏼🏼🏻🏻🏻🏻🏻🏻🏼🏼🏽🏽🏽🏽🏾🏾🏿    111988  angularjs    
5   🏿🏿🏿🏿🏾🏾🏽🏼🏼🏻🏻🏻🏻🏻🏻🏼🏼🏼🏽🏽🏽🏽🏾🏿    212843  jquery   
6   🏿🏿🏿🏾🏾🏾🏽🏼🏼🏻🏻🏻🏻🏻🏻🏻🏼🏼🏼🏽🏽🏽🏾🏿    138143  mysql    
7   🏿🏿🏿🏿🏿🏾🏽🏼🏼🏻🏻🏻🏼🏻🏻🏻🏻🏼🏼🏼🏼🏽🏾🏾    107586  swift    
8   🏿🏿🏿🏿🏾🏾🏽🏼🏼🏻🏻🏻🏼🏻🏼🏼🏼🏽🏽🏽🏽🏾🏾🏿    318294  php  
9   🏿🏿🏿🏿🏾🏾🏽🏼🏼🏻🏻🏻🏻🏻🏻🏻🏼🏼🏼🏽🏽🏽🏾🏾    84723   json     
10  🏿🏿🏿🏿🏿🏾🏽🏼🏼🏻🏻🏻🏻🏻🏻🏻🏼🏼🏼🏼🏽🏽🏾🏾    233100  html     
11  🏿🏿🏿🏿🏿🏾🏽🏼🏼🏻🏻🏻🏻🏻🏻🏻🏼🏼🏼🏽🏽🏽🏾🏿    390245  java     
12  🏿🏿🏿🏿🏿🏾🏽🏽🏼🏻🏻🏼🏻🏻🏻🏻🏼🏽🏽🏽🏽🏽🏾🏿    83787   angular  
13  🏿🏿🏿🏿🏾🏾🏽🏽🏼🏼🏼🏼🏼🏻🏻🏻🏼🏼🏽🏽🏽🏽🏾🏿    70150   sql-server   
14  🏿🏿🏿🏿🏿🏾🏽🏽🏼🏻🏻🏻🏻🏻🏻🏻🏼🏼🏼🏼🏽🏽🏾🏾    534663  javascript   
15  🏿🏿🏿🏿🏿🏾🏽🏽🏼🏻🏻🏼🏼🏻🏻🏻🏼🏼🏽🏽🏽🏾🏾🏿    291541  c#   
16  🏿🏿🏿🏿🏿🏿🏾🏾🏽🏼🏼🏽🏼🏼🏻🏻🏻🏻🏻🏼🏼🏽🏽🏾    65668   c    
17  🏿🏿🏿🏿🏿🏾🏽🏽🏽🏼🏼🏼🏼🏻🏻🏻🏼🏼🏼🏼🏽🏽🏾🏿    111792  sql  
18  🏿🏿🏿🏿🏿🏾🏾🏽🏽🏼🏻🏼🏼🏻🏻🏻🏻🏼🏼🏼🏼🏽🏾🏾    158999  css  
19  🏿🏿🏿🏿🏿🏿🏾🏽🏽🏼🏼🏼🏼🏻🏻🏻🏻🏼🏼🏼🏼🏽🏽🏾    88146   arrays   
20  🏿🏿🏿🏿🏿🏿🏾🏾🏽🏼🏼🏽🏼🏼🏻🏻🏻🏼🏼🏼🏼🏼🏽🏾    61840   ruby-on-rails    
21  🏿🏿🏿🏿🏿🏿🏾🏾🏽🏼🏼🏼🏼🏻🏻🏻🏼🏼🏼🏼🏼🏽🏾🏾    136265  c++  
22  🏿🏿🏿🏿🏿🏾🏽🏽🏽🏻🏻🏼🏼🏻🏻🏻🏻🏼🏼🏼🏽🏽🏾🏾    104218  node.js  
23  🏿🏿🏿🏿🏿🏿🏿🏾🏾🏽🏽🏽🏼🏼🏻🏻🏻🏼🏼🏼🏼🏽🏾🏾    360396  python   
24  🏿🏿🏿🏿🏿🏿🏿🏾🏾🏽🏽🏽🏽🏼🏻🏻🏻🏼🏼🏼🏼🏽🏾🏾    98690   r   

enter image description here

And a more compact shaded gradient, but with only 3 values:

#standardSQL
CREATE TEMP FUNCTION barchart(v ARRAY<FLOAT64>, mm STRUCT<min FLOAT64, max FLOAT64>) AS ((
    SELECT STRING_AGG(SUBSTR('▓▒░', 1+CAST(ROUND(y) AS INT64), 1), '') 
    FROM (SELECT IFNULL(SAFE_DIVIDE((e-mm.min),(mm.max-mm.min))*2, 0) y FROM UNNEST(v) e))); 
CREATE TEMP FUNCTION vbar(v ARRAY<FLOAT64>) AS ( 
  barchart(v, (SELECT AS STRUCT MIN(a), MAX(a) FROM UNNEST(v) a)) 
);



WITH top_countries AS (
 (SELECT x.value FROM (SELECT APPROX_TOP_COUNT(country_code, 12) x FROM `ghtorrent-bq.ght_2017_09_01.users`), UNNEST(x) x)
)

SELECT vbar(ARRAY_AGG(1.0*hhh.count ORDER BY hhh.value)) gradient, SUM(hhh.count) c, country_code
FROM (
  SELECT country_code, APPROX_TOP_COUNT(EXTRACT(HOUR FROM a.created_at), 24) h_h
  FROM `githubarchive.year.2017` a
  JOIN `ghtorrent-bq.ght_2017_09_01.users` b
  ON a.actor.login=b.login
  WHERE country_code IN (SELECT * FROM top_countries) 
  AND actor.login NOT IN (SELECT value FROM (SELECT APPROX_TOP_COUNT(actor.login, 1000) x FROM `githubarchive.year.2017` WHERE type='WatchEvent'), UNNEST(x))
  AND a.type='WatchEvent'
  GROUP BY 1
), UNNEST(h_h) hhh
GROUP BY country_code 
ORDER BY STRPOS(gradient, '░')

Row gradient                    c       country_code     
1   ░░░░░░░▒▒▒▒▒▒▒▒▓▓▓▓▓▓▒▒░    204023  au   
2   ▒░░░░░░░░░▒▒▒▒▒▒▒▓▓▓▓▓▓▒    293589  jp   
3   ▓▒░░▒▒░░░░▒▒▒▒▒▒▒▓▓▓▓▓▓▓    2125724 cn   
4   ▓▓▓▒▒░░░░░░░░▒▒▒▒▒▒▒▒▓▓▓    447092  in   
5   ▓▓▓▓▓▓▒▒░░░░░░░░▒▒▒▒▒▒▒▓    381510  ru   
6   ▓▓▓▓▓▓▒▒░░░░░░░░▒▒▒▒▒▒▒▒    545906  de   
7   ▓▓▓▓▓▓▓▒░░░▒░░░░▒▒▒▒▒▒▒▒    395949  fr   
8   ▓▓▓▓▓▓▓▒▒░░░░░░░░▒▒▒▒▒▒▒    491068  gb   
9   ▒▒▒▒▓▓▓▓▓▓▓▒░░░▒░░░░░▒▒▒    419608  br   
10  ▒▒▒▒▒▒▒▓▓▓▓▓▓▒▒░░░░░░░░▒    2443381 us   
11  ▒▒▒▒▒▒▒▓▓▓▓▓▓▒▒░░░░░░░▒▒    294793  ca   

And a short code for sparklines - works great with Data Studio:

#standardSQL
CREATE TEMP FUNCTION barchart(v ARRAY<FLOAT64>, mm STRUCT<min FLOAT64, max FLOAT64>) AS ((
    SELECT STRING_AGG(SUBSTR('▁▂▃▄▅▆▇█', 1+CAST(ROUND(y) AS INT64), 1), '') 
    FROM (SELECT IFNULL(SAFE_DIVIDE((e-mm.min),(mm.max-mm.min))*7, 0) y FROM UNNEST(v) e))); 
CREATE TEMP FUNCTION vbar(v ARRAY<FLOAT64>) AS ( 
  barchart(v, (SELECT AS STRUCT MIN(a), MAX(a) FROM UNNEST(v) a)) 
);
like image 41
Felipe Hoffa Avatar answered Nov 23 '22 19:11

Felipe Hoffa


More craziness here 😊
Totally useless - but fun to play with

Applying all different options presented in this post for image processing and drawing (using profile pictures of those contribute into this post) + some new





1st and 2nd result (for Felipe's picture) produced using Felipe's Color Gradient approach with different scaling options

3rd result - using Felipe's Shaded Gradient approach

4th result - using Mikhail's(mine)/Mosha's Spark-line approach

Finally 5th and 6th results - using ASCII characters sets representing ASCII Shades of Gray - respectively:
Short set - " .:-=+*#%@"
Full (long) set - "$@B%8&WM#*oahkbdpqwmZO0QLCJUYXzcvunxrjft/\|()1{}[]?-_+~<>i!lI;:,"^``'. "

Code is trivial and literally same as in respective answers - the only difference is that data used in above exercises is image's pixels data that is simply acquired using HTML canvas getImageData() Method - obviously outside of BigQuery - with just simple html page

Options for getting crazy here and having fun playing with image transformation / processing - limitless! but probably useless outside of just learning scope 😜

like image 42
Mikhail Berlyant Avatar answered Nov 23 '22 19:11

Mikhail Berlyant