Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cast DATETIME to STRING working on ARRAY of STRUCT on BigQuery Standard SQL

in my table I have an attribute called 'messages' with this exact data type:

ARRAY<STRUCT<created_time  DATETIME ,`from` STRUCT<id STRING, 
name STRING,email STRING>, id STRING, message STRING>>

and I have defined a UDF named my_func()

Because UDF function in Big Query don't support the type DATETIME I need to cast the attribute created_time.

So I tried this:

safe_cast ( messages as ARRAY<STRUCT<created_time  STRING ,
'from` STRUCT<id STRING, name STRING, email STRING>,
id STRING, message STRING>>) as messages_casted 

and I get this error

Casting between arrays with incompatible element types is not
supported: Invalid cast from...

There is an error in the way I cast an array struct?

There is some way to use UDF with this data structure or the only way is to flatten the array and do the cast?

My goal is to take the array in the JS execution environment in order to make the aggregation with JS code.

like image 448
nicola del verme Avatar asked Jan 01 '26 00:01

nicola del verme


2 Answers

When working with JavaScript UDFs, you don't need to specify complex input data types explicitly. Instead, you can use the TO_JSON_STRING function. In your case, you can have the UDF take messages as a STRING, then parse it inside the UDF. You would call my_func(TO_JSON_STRING(messages)), for example.

Here is an example from the documentation:

CREATE TEMP FUNCTION SumFieldsNamedFoo(json_row STRING)
  RETURNS FLOAT64
  LANGUAGE js AS """
function SumFoo(obj) {
  var sum = 0;
  for (var field in obj) {
    if (obj.hasOwnProperty(field) && obj[field] != null) {
      if (typeof obj[field] == "object") {
        sum += SumFoo(obj[field]);
      } else if (field == "foo") {
        sum += obj[field];
      }
    }
  }
  return sum;
}
var row = JSON.parse(json_row);
return SumFoo(row);
""";

WITH Input AS (
  SELECT STRUCT(1 AS foo, 2 AS bar, STRUCT('foo' AS x, 3.14 AS foo) AS baz) AS s, 10 AS foo UNION ALL
  SELECT NULL, 4 AS foo UNION ALL
  SELECT STRUCT(NULL, 2 AS bar, STRUCT('fizz' AS x, 1.59 AS foo) AS baz) AS s, NULL AS foo
)
SELECT
  TO_JSON_STRING(t) AS json_row,
  SumFieldsNamedFoo(TO_JSON_STRING(t)) AS foo_sum
FROM Input AS t;
+---------------------------------------------------------------------+---------+
| json_row                                                            | foo_sum |
+---------------------------------------------------------------------+---------+
| {"s":{"foo":1,"bar":2,"baz":{"x":"foo","foo":3.14}},"foo":10}       | 14.14   |
| {"s":null,"foo":4}                                                  | 4       |
| {"s":{"foo":null,"bar":2,"baz":{"x":"fizz","foo":1.59}},"foo":null} | 1.59    |
+---------------------------------------------------------------------+---------+
like image 74
Elliott Brossard Avatar answered Jan 04 '26 03:01

Elliott Brossard


Because UDF function in Big Query don't support the type DATETIME I need to cast the attribute created_time.

Below is for BigQuery Standard SQL and is a very simple way of Casting specific element of ARRAY leaving everything else as is.
So in below example - it CASTs created_time from DATETIME to STRING (you can use any compatible type you need in your case though)

#standardSQL
SELECT messages,
  ARRAY(SELECT AS STRUCT * REPLACE (SAFE_CAST(created_time AS STRING) AS created_time) 
    FROM UNNEST(messages) message
  ) casted_messages
FROM `project.dataset.table`  

If you run it against your data - you will see original and casted messages - all elements should be same (value/type) with exception of (as expected) created_time which will be of casted type (STRING in this particular case) or NULL if not compatible

You can test / play with above using dummy data as below

#standardSQL
WITH `project.dataset.table` AS (
  SELECT [STRUCT<created_time  DATETIME, 
            `from` STRUCT<id STRING, name STRING, email STRING>, 
            id STRING, 
            message STRING>
      (DATETIME '2018-01-01 13:00:00', ('1', 'mike', 'zzz@ccc'), 'a', 'abc'),
      (DATETIME '2018-01-02 14:00:00', ('2', 'john', 'yyy@bbb'), 'b', 'xyz')
    ] messages
)
SELECT messages,
  ARRAY(SELECT AS STRUCT * REPLACE (SAFE_CAST(created_time AS STRING) AS created_time) 
    FROM UNNEST(messages) message
  ) casted_messages
FROM `project.dataset.table`
like image 20
Mikhail Berlyant Avatar answered Jan 04 '26 01:01

Mikhail Berlyant



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!