I am trying to figure out if it is possible to export the hyperloglog sketches from big query and merge them on the outside for cardinality estimation. Is there an open source library available that can readily parse the big query sketches?
If not, is there any publically available information about the format of biq query's hyperloglog sketches? Specifically, which hashing algorithm is used, what type of meta-data is contained, and how the sketches are structured?
The details of the sketch format and hashing for the HLL_COUNT family of functions is not public at this time.
Could you file a feature request on the public issue tracker with more details (e.g. what tools/langs/libraries are would you prefer to interoperate with for cardinality estimation)?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With