How to perform pd.get_dummies() on a dataframe while simultaneously keeping NA values in place instead of creating an NA column?

Question

I have a dataset with some missing data. I would like to maintain the missingness within the data while performing pd.get_dummies().

Here is an example dataset:

Table 1.

someCol
   A
   B
   NA
   C
   D

I would expect pd.get_dummies(df, dummy_na=True)) to transform the data into something like this:

Table 2.

someCol_A  someCol_B  someCol_NA  someCol_C  someCol_D
    1         0           0           0          0    
    0         1           0           0          0    
    0         0           1           0          0    
    0         0           0           1          0    
    0         0           0           0          1

But, what I would like is this:

Table 3.

someCol_A  someCol_B   someCol_C  someCol_D
    1         0           0          0    
    0         1           0          0    
    NA        NA          NA         NA    
    0         0           1          0    
    0         0           0          1

Notice that the 3rd row has the NA in place of all of the row values broken out from the original column.

How can I achieve the results of Table 3?

sacuL · Accepted Answer

A bit of a hack, but you could do something like this, where you're only getting the dummies for the non-null rows, and then re-inserting the missing values in their proper place by re-indexing the resulting dummies by the index of the original dataframe

pd.get_dummies(df.dropna()).reindex(df.index)

   someCol_A  someCol_B  someCol_C  someCol_D
0        1.0        0.0        0.0        0.0
1        0.0        1.0        0.0        0.0
2        NaN        NaN        NaN        NaN
3        0.0        0.0        1.0        0.0
4        0.0        0.0        0.0        1.0

How to perform pd.get_dummies() on a dataframe while simultaneously keeping NA values in place instead of creating an NA column?

Tags:

python

pandas

data-science

Zakariah Siyaji

1 Answers

sacuL

Recent Activity

Donate For Us

How to perform pd.get_dummies() on a dataframe while simultaneously keeping NA values in place instead of creating an NA column?

Tags:

python

pandas

data-science

Zakariah Siyaji

1 Answers

sacuL

Related questions

Recent Activity

Donate For Us