Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to UNNest multiple arrays in BigQuery?

I have this json that is stored in a BigQuery table in 3 fields token, questions, answers

token:STRING, questions:STRING, answers:STRING

Questions and answers are STRING because they are dynamic fields.

token field has single value.

questions field has dictionary object with "fields" being list object and has 3 questions.

answers field is a list object with answers to the 3 questions and id will be used for matching a question to an answer. Below is the JSON download from bigquery

token          questions                                      answers
18e6d8e445     {"fields": [{"id": "L39FyvUohKDV", "properties": {}, "ref": "d8834652-3acf-4541-8354-1e3dcd716667", "title": "What did you think about the changes?", "type": "short_text"}, {"id": "krs82KgxHwGb", "properties": {}, "ref": "5b6e6796-635b-4595-9404-e81617d4540b", "title": "How useful is this feature turning out to be for you?", "type": "opinion_scale"}, {"id": "lBzHtCuzHFM4", "properties": {}, "ref": "b76be913-19b9-4b8a-b2ac-3fb645a65a5c", "title": "Your email address", "type": "email"}], "id": "SdzXVn", "title": "Google Shopping 5/4/18"}       [{"field": {"id": "L39FyvUohKDV", "type": "short_text"}, "text": "t", "type": "text"}, {"field": {"id": "krs82KgxHwGb", "type": "opinion_scale"}, "number": 10, "type": "number"}, {"email": "[email protected]", "field": {"id": "lBzHtCuzHFM4", "type": "email"}, "type": "email"}]
949b2c57e3     {"fields": [{"id": "krs82KgxHwGb", "properties": {}, "ref": "5b6e6796-635b-4595-9404-e81617d4540b", "title": "How useful is this feature turning out to be for you?", "type": "opinion_scale"}, {"id": "lBzHtCuzHFM4", "properties": {}, "ref": "b76be913-19b9-4b8a-b2ac-3fb645a65a5c", "title": "Your email address", "type": "email"}, {"id": "L39FyvUohKDV", "properties": {}, "ref": "d8834652-3acf-4541-8354-1e3dcd716667", "title": "What did you think about the changes?", "type": "short_text"}], "id": "SdzXVn", "title": "Google Shopping 5/4/18"}       [{"field": {"id": "krs82KgxHwGb", "type": "opinion_scale"}, "number": 10, "type": "number"}, {"email": "[email protected]", "field": {"id": "lBzHtCuzHFM4", "type": "email"}, "type": "email"}, {"field": {"id": "L39FyvUohKDV", "type": "short_text"}, "text": "they were awesome", "type": "text"}]             
146c49cdd6     {"fields": [{"id": "CxhfK22a3XWE", "properties": {}, "ref": "d8834652-3acf-4541-8354-1e3dcd716667", "title": "What did you think about the changes?", "type": "short_text"}, {"id": "oUZxPRaKjmFr", "properties": {}, "ref": "5b6e6796-635b-4595-9404-e81617d4540b", "title": "How useful is this feature turning out to be for you?", "type": "opinion_scale"}, {"id": "zUIP73oXpLD6", "properties": {}, "ref": "b76be913-19b9-4b8a-b2ac-3fb645a65a5c", "title": "Your email address", "type": "email"}], "id": "kaiAsx", "title": "a - b"}                        [{"field": {"id": "CxhfK22a3XWE", "type": "short_text"}, "text": "nice", "type": "text"}, {"field": {"id": "oUZxPRaKjmFr", "type": "opinion_scale"}, "number": 2, "type": "number"}, {"email": "[email protected]", "field": {"id": "zUIP73oXpLD6", "type": "email"}, "type": "email"}]        

@mikhail-berlyant provided this query below which gets me pretty close to what I am expecting. The only problem I am having is that I am unable to get the answer.

SELECT distinct token, id, title AS question,
JSON_EXTRACT_SCALAR(CONCAT('{',a,'}'), '$.type') answer_type
--REPLACE(REGEXP_EXTRACT(b, r'"type":".+?"\s*,\s*".+?":(.+)'), '"', '') answer
FROM `v1-dev-main.typeform.responses`,
UNNEST(REGEXP_EXTRACT_ALL(JSON_EXTRACT(definition, '$.fields'), r'"title":"(.+?)"')) title WITH OFFSET pos1,
UNNEST(REGEXP_EXTRACT_ALL(JSON_EXTRACT(definition, '$.fields'), r'"id":"(.+?)"')) id WITH OFFSET pos2,
UNNEST(REGEXP_EXTRACT_ALL(answers, r'"field": {(.+?)}')) a WITH OFFSET pos3
--UNNEST(REGEXP_EXTRACT_ALL(answers, r'{(.+?),\s*"field":{.+?}')) b WITH OFFSET pos4
WHERE pos1 = pos2 
--AND pos3 = pos4 
AND id = JSON_EXTRACT_SCALAR(CONCAT('{',a,'}'), '$.id')   

Here is the result of above query

token                       id             question       answer_type
146c43c81cd5780839d3cdd6    zUIP73oXpLD6    Your email address  email
146c493c1cd5780839d3cdd6    oUZxPRaKjmFr    How useful is this feature turning out to be for you?   opinion_scale
146c493c05d5780839d3cdd6    CxhfK22a3XWE    What did you think about the changes?   short_text
18e6d8e33df44a1aa451b445    lBzHtCuzHFM4    Your email address  email
18e6d8e33df44a1aa451b445    L39FyvUohKDV    What did you think about the changes?   short_text
18e6d0fa014bfa1aa451b445    krs82KgxHwGb    How useful is this feature turning out to be for you?   opinion_scale
a63b20df691c9a949b2c57e3    krs82KgxHwGb    How useful is this feature turning out to be for you?   opinion_scale
a63b20df691c9a949b2c57e3    lBzHtCuzHFM4    Your email address  email
a63b258ce0339a949b2c57e3    L39FyvUohKDV    What did you think about the changes?   short_text

Now, I am just missing the answer.

like image 271
sam Avatar asked Jan 02 '23 07:01

sam


1 Answers

Below example is for BigQuery Standard SQL and makes some assumption about your data in terms of how those json strings are formatted - so it most will likely require some tuning for regexp's. But it works with below dummy data

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 12345 token, 
'''{"fields": [
    {"id":"1","title":"Question 1?"},
    {"id":"2","title":"Questions 2?"},
    {"id":"3","title":"Question 3?"}
  ]}''' questions,
'''[  
  {"type":"text", "text":"answer 1", "field":{"id":"1", "type":"short_text"}},
  {"type":"number", "number":42, "field":{"id":"2", "type":"opinion_scale"}},
  {"type":"email", "email":"[email protected]", "field":{"id":"3", "type":"email"}}
  ]''' answers 
)
SELECT token, id, title AS question,
  JSON_EXTRACT_SCALAR(CONCAT('{',a,'}'), '$.type') answer_type,
  REPLACE(REGEXP_EXTRACT(b, r'"type":".+?"\s*,\s*".+?":(.+)'), '"', '') answer
FROM `project.dataset.table`,
UNNEST(REGEXP_EXTRACT_ALL(JSON_EXTRACT(questions, '$.fields'), r'"title":"(.+?)"')) title WITH OFFSET pos1,
UNNEST(REGEXP_EXTRACT_ALL(JSON_EXTRACT(questions, '$.fields'), r'"id":"(.+?)"')) id WITH OFFSET pos2,
UNNEST(REGEXP_EXTRACT_ALL(answers, r'"field":{(.+?)}')) a WITH OFFSET pos3,
UNNEST(REGEXP_EXTRACT_ALL(answers, r'{(.+?),\s*"field":{.+?}')) b WITH OFFSET pos4
WHERE pos1 = pos2 
AND pos3 = pos4 
AND id = JSON_EXTRACT_SCALAR(CONCAT('{',a,'}'), '$.id')    

with results as

Row token   id  question        answer_type     answer   
1   12345   1   Question 1?     short_text      answer 1     
2   12345   2   Questions 2?    opinion_scale   42   
3   12345   3   Question 3?     email           [email protected]   

Update based on below comments

#standardSQL
WITH `project.dataset.table` AS (
  SELECT "12345" token, '{"fields": [{"id":"1","title":"Question 1?"},{"id":"2","title":"Questions 2?"},{"id":"3","title":"Question 3?"}]}' questions,'[  {"type":"text", "text":"answer 1", "field":{"id":"1", "type":"short_text"}},{"type":"number", "number":42, "field":{"id":"2", "type":"opinion_scale"}},{"type":"email", "email":"[email protected]", "field":{"id":"3", "type":"email"}}]' answers UNION ALL
  SELECT "18e6d8e33df440fa014bfa1aa451b445", '{"fields": [{"id": "L39FyvUohKDV", "properties": {}, "ref": "d8834652-3acf-4541-8354-1e3dcd716667", "title": "What did you think about the changes?", "type": "short_text"}, {"id": "krs82KgxHwGb", "properties": {}, "ref": "5b6e6796-635b-4595-9404-e81617d4540b", "title": "How useful is this feature turning out to be for you?", "type": "opinion_scale"}, {"id": "lBzHtCuzHFM4", "properties": {}, "ref": "b76be913-19b9-4b8a-b2ac-3fb645a65a5c", "title": "Your email address", "type": "email"}], "id": "SdzXVn", "title": "Google Shopping 5/4/18"}', '[{"field": {"id": "L39FyvUohKDV", "type": "short_text"}, "text": "t", "type": "text"}, {"field": {"id": "krs82KgxHwGb", "type": "opinion_scale"}, "number": 10, "type": "number"}, {"email": "[email protected]", "field": {"id": "lBzHtCuzHFM4", "type": "email"}, "type": "email"}]"' UNION ALL
  SELECT "a63b258ce03360df691c9a949b2c57e3", '{"fields": [{"id": "krs82KgxHwGb", "properties": {}, "ref": "5b6e6796-635b-4595-9404-e81617d4540b", "title": "How useful is this feature turning out to be for you?", "type": "opinion_scale"}, {"id": "lBzHtCuzHFM4", "properties": {}, "ref": "b76be913-19b9-4b8a-b2ac-3fb645a65a5c", "title": "Your email address", "type": "email"}, {"id": "L39FyvUohKDV", "properties": {}, "ref": "d8834652-3acf-4541-8354-1e3dcd716667", "title": "What did you think about the changes?", "type": "short_text"}], "id": "SdzXVn", "title": "Google Shopping 5/4/18"}', '[{"field": {"id": "krs82KgxHwGb", "type": "opinion_scale"}, "number": 10, "type": "number"}, {"email": "[email protected]", "field": {"id": "lBzHtCuzHFM4", "type": "email"}, "type": "email"}, {"field": {"id": "L39FyvUohKDV", "type": "short_text"}, "text": "they were awesome", "type": "text"}]"' UNION ALL
  SELECT "146c493c051a0a481cd5780839d3cdd6", '{"fields": [{"id": "CxhfK22a3XWE", "properties": {}, "ref": "d8834652-3acf-4541-8354-1e3dcd716667", "title": "What did you think about the changes?", "type": "short_text"}, {"id": "oUZxPRaKjmFr", "properties": {}, "ref": "5b6e6796-635b-4595-9404-e81617d4540b", "title": "How useful is this feature turning out to be for you?", "type": "opinion_scale"}, {"id": "zUIP73oXpLD6", "properties": {}, "ref": "b76be913-19b9-4b8a-b2ac-3fb645a65a5c", "title": "Your email address", "type": "email"}], "id": "kaiAsx", "title": "a - b"}', '[{"field": {"id": "CxhfK22a3XWE", "type": "short_text"}, "text": "nice", "type": "text"}, {"field": {"id": "oUZxPRaKjmFr", "type": "opinion_scale"}, "number": 2, "type": "number"}, {"email": "[email protected]", "field": {"id": "zUIP73oXpLD6", "type": "email"}, "type": "email"}]"'
)
SELECT token, id, title AS question,
  JSON_EXTRACT_SCALAR(CONCAT('{',a,'}'), '$.type') answer_type,
  COALESCE(JSON_EXTRACT_SCALAR(b, '$.text'),JSON_EXTRACT_SCALAR(b, '$.number'),JSON_EXTRACT_SCALAR(b, '$.email')) AS answer
FROM `project.dataset.table`,
UNNEST(REGEXP_EXTRACT_ALL(JSON_EXTRACT(questions, '$.fields'), r'"title":\s*"(.+?)"')) title WITH OFFSET pos1,
UNNEST(REGEXP_EXTRACT_ALL(JSON_EXTRACT(questions, '$.fields'), r'"id":\s*"(.+?)"')) id WITH OFFSET pos2,
UNNEST(REGEXP_EXTRACT_ALL(answers, r'"field":\s*{(.+?)}')) a WITH OFFSET pos3,
UNNEST(REGEXP_EXTRACT_ALL(REGEXP_REPLACE(answers, r'"field":\s*{.+?}', '"field": ""'), r'{.+?}')) b WITH OFFSET pos4
WHERE pos1 = pos2 
AND pos3 = pos4 
AND id = JSON_EXTRACT_SCALAR(CONCAT('{',a,'}'), '$.id') 

Output is

Row token                               id              question                                                answer_type     answer   
1   12345                               1               Question 1?                                             short_text      answer 1     
2   12345                               2               Questions 2?                                            opinion_scale   42   
3   12345                               3               Question 3?                                             email           [email protected]   
4   18e6d8e33df440fa014bfa1aa451b445    L39FyvUohKDV    What did you think about the changes?                   short_text      t    
5   18e6d8e33df440fa014bfa1aa451b445    krs82KgxHwGb    How useful is this feature turning out to be for you?   opinion_scale   10   
6   18e6d8e33df440fa014bfa1aa451b445    lBzHtCuzHFM4    Your email address                                      email           [email protected]  
7   a63b258ce03360df691c9a949b2c57e3    krs82KgxHwGb    How useful is this feature turning out to be for you?   opinion_scale   10   
8   a63b258ce03360df691c9a949b2c57e3    lBzHtCuzHFM4    Your email address                                      email           [email protected]     
9   a63b258ce03360df691c9a949b2c57e3    L39FyvUohKDV    What did you think about the changes?                   short_text      they were awesome    
10  146c493c051a0a481cd5780839d3cdd6    CxhfK22a3XWE    What did you think about the changes?                   short_text      nice     
11  146c493c051a0a481cd5780839d3cdd6    oUZxPRaKjmFr    How useful is this feature turning out to be for you?   opinion_scale   2    
12  146c493c051a0a481cd5780839d3cdd6    zUIP73oXpLD6    Your email address                                      email           [email protected]  
like image 130
Mikhail Berlyant Avatar answered Jan 04 '23 19:01

Mikhail Berlyant