Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling unassigned (null) values of features in regression (machine learning)?

I want to do linear regression analysis. I have multiple features. Some features has unassigned (null) values for some items in data. Because for some items some specific feature values were missed in data source. To be more clear, I provide example: enter image description here

As you can see, some items missing values for some features. For now, I just assigned it to 'Null', but how to handle this values when doing linear regression analysis of the data? I do not want this unassigned values to incorrectly affect regression model. Unfortunately I cannot get rid of items where unassigned feature values presented. I plan to use Python for regression.

like image 653
Erba Aitbayev Avatar asked Dec 03 '15 00:12

Erba Aitbayev


1 Answers

You need to either ignore those rows -- you've already said you can't, and it's not a good idea with the quantity of missing values -- or use an algorithm that proactively discounts those items, or impute (that's the technical term for filling in an educated guess) the missing data.

There's a limited amount of help we can give, because you haven't given us the semantics you want for missing data. You can impute some of the missing values by using your favourite "closest match" algorithm against the data you do have. For instance, you may well be able to infer a good guess for area from the other data.

For your non-linear, discrete items (i.e. District), you may well want to to keep NULL as a separate district. If you have few enough missing entries, you'll be able to get a decent model anyway.

A simple imputation is to replace each NULL with the mean value for the feature, but this works only for those with a proper mean (i.e. not District).

Overall, I suggest that you search for appropriate references on "impute missing data". Since we're not sure of your needs, we can't help much with this, and doing so is outside the scope of SO.

like image 149
Prune Avatar answered Oct 18 '22 03:10

Prune