How does BigQuery ML deals with NULL numeric features?

Question

With categorical features, we can see that BigQuery ML automatically creates a "_null_filler" dummy variable by running ML.WEIGHTS on the created model, which makes sense.

In the case of numeric features, the missing values are imputed using the mean or something else? And are those two behaviors mentioned anywhere in the official documentation?

Amir Hormati · Accepted Answer

Imputation is the process in statistics of replacing missing data with substituted values. When training, missing values occur when BigQuery encounters a null value in the dataset. In prediction, missing values can occur when BigQuery encounters a null value or a previously unseen value. The following documents how BigQuery ML handles missing data in various cases.

For numerical types (that are automatically Standardized by BigQuery ML), null values will be replaced with the mean value as calculated by the feature column in the original input dataset for both training and prediction.

For one-hot encoded columns, an additional category is added that all null values will map to for training and prediction. Unseen data is de-facto assigned a weight of 0 at prediction.

We're missing this information in our public documents. We're working on adding that right now. Thanks for bringing this up.

How does BigQuery ML deals with NULL numeric features?

Tags:

google-bigquery

taksqth

1 Answers

Amir Hormati

Recent Activity

Donate For Us

How does BigQuery ML deals with NULL numeric features?

Tags:

google-bigquery

taksqth

1 Answers

Amir Hormati

Related questions

Recent Activity

Donate For Us