I have a dataset with some missing data. I would like to maintain the missingness within the data while performing pd.get_dummies().
Here is an example dataset:
Table 1.
someCol
A
B
NA
C
D
I would expect pd.get_dummies(df, dummy_na=True)) to transform the data into something like this:
Table 2.
someCol_A someCol_B someCol_NA someCol_C someCol_D
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
But, what I would like is this:
Table 3.
someCol_A someCol_B someCol_C someCol_D
1 0 0 0
0 1 0 0
NA NA NA NA
0 0 1 0
0 0 0 1
Notice that the 3rd row has the NA in place of all of the row values broken out from the original column.
How can I achieve the results of Table 3?
A bit of a hack, but you could do something like this, where you're only getting the dummies for the non-null rows, and then re-inserting the missing values in their proper place by re-indexing the resulting dummies by the index of the original dataframe
pd.get_dummies(df.dropna()).reindex(df.index)
someCol_A someCol_B someCol_C someCol_D
0 1.0 0.0 0.0 0.0
1 0.0 1.0 0.0 0.0
2 NaN NaN NaN NaN
3 0.0 0.0 1.0 0.0
4 0.0 0.0 0.0 1.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With