Is bigQuery ANY_VALUE deterministic? I have a query that produces ~200,000 rows of results but if I filter out duplicate entries after the query, they reduce down to about ~500. To solve that problem in the query itself, I added a GROUP BY and then wrapped all the attributes with `ANY_VALUE(tN.fieldX) as tN_fieldX . The output, when sorted, saved as .csv and executed several times, returns the same md5sum file of results.
Does this mean that the ANY_VALUE is solving my problem of duplicate entries because it would give different values every time due to being non-deterministic in bigQuery?
Obviously, ANY_VALUE is non-deterministic - but if you apply the function against the GROUP'ed BY value - it kind of becomes deterministic in a sense that it randomly pickes value from a group of the same values. So, Yes- it helps in solving problem of duplicates in cases like yours
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With