I'm trying to keep rows in a dataset that contain missing data.
When one-hot encoding a column (or multiple columns) with sklearn. Is it possible to write a rule that if currentItem == null
or if currentItem == 0
then set the output array to all 0s?
e.g.
A A B
-> [[1, 0], [1, 0], [0,1]]
B B A
-> [[0, 1], [0, 1], [1,0]]
null B A
-> [[0, 0], [0, 1], [1,0]]
one-hot encoding:
import numpy as np
from sklearn.preprocessing import LabelEncoder
dataset = np.loadtxt("someFile.csv", delimiter=",")
B = dataset[:,1]
encoder = LabelEncoder()
encoder.fit(B)
encoded_B = encoder.transform(B)
Y = to_categorical(encoded_B)
EDIT - Example Dataset: Where A-E are inputs and X & Y and outputs
A B C D E X Y
7 6 3 3 2 11 4
5 6 0 0 7 15 7
3 3 9 null 7 12 7
7 null 7 null 7 12 13
null 7 4 6 12 13 4
null 5 7 6 null 14 7
2 6 0 0 2 13 3
7 null 7 null 2 13 7
Another disadvantage of one-hot encoding is that it produces multicollinearity among the various variables, lowering the model's accuracy. In addition, you may wish to transform the values back to categorical form so that they may be displayed in your application.
For the numerical Columns you can try replacing the missing values by taking Mean / Median of the column values. This method is suitable for Categorical data which i assume is your case. You can try replacing missing vlaues in all three Columns with the most frequently occuring value in the given column.
Problems Faced with One Hot EncodingThe Dummy Variable Trap, therefore, leads to another problem known as multicollinearity. Multicollinearity occurs only when there is a dependency between the independent features.
If you have pandas, this is pretty simple.
s = pd.Series(['A', 'A', 0, 'B', 0, 'A', np.nan])
s
0 A
1 A
2 0
3 B
4 0
5 A
6 NaN
dtype: object
Use replace
to convert 0
to NaN -
s = s.replace({0 : np.nan, '0' : np.nan})
s
0 A
1 A
2 NaN
3 B
4 NaN
5 A
6 NaN
dtype: object
Now, call pd.get_dummies
, which ignores NaN values.
pd.get_dummies(s)
A B
0 1 0
1 1 0
2 0 0
3 0 1
4 0 0
5 1 0
6 0 0
The solution is the same for a dataframe.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With