Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Melting incomplete data in python without dropping all NaN

I'm trying to melt 3 columns into one categorical column using pd.melt. Currently, the dataframe looks some like this.

    id1   Plane  Car   Boat
0   123   None   None  None
1   124   Plane  None  None
2   125   None   None  Boat

At some point, I replace None with NaN, but I'm not sure if that is necessary before melting. My goal is to have 1 category column, which lists the type of vehicle it is, using None only if all columns are empty.

    id1   Type
0   123   None   
1   124   Plane  
2   125   Boat   

The code I came up with was this:

df = pd.melt(df, id_vars=['id1'], var_name='Type')

The issue I have is that it triples the observations in my dataframe. I could filter out rows where Type = None, but that drops data such as id1 = 123 where all three of the original columns were None.

    id1   Type
0   123   None   
1   123   None  
2   123   None  
3   124   Plane
4   124   None   
5   124   None  

Is there an efficient way to do this with melt? Or do I need to loop through the data and write to a new dataframe with conditionals?

like image 462
TylerLOL Avatar asked Oct 25 '25 22:10

TylerLOL


1 Answers

You and do it this way. Using reindex to get back those missing id values.

df1 = df.replace('None',np.nan).set_index('id1')
df1.stack().reset_index(level=1, drop=True).reindex(df1.index)

Output:

id1
123      NaN
124    Plane
125     Boat
dtype: object
like image 193
Scott Boston Avatar answered Oct 27 '25 12:10

Scott Boston