Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to combine numeric columns in pandas dataframe with NaN?

I have a dataframe with this format:

ID measurement_1 measurement_2
0      3              NaN
1      NaN            5
2      NaN            7 
3      NaN            NaN

I want to combine to:

ID measurement measurement_type
0      3              1
1      5              2
2      7              2

For each row there will be a value in either measurement_1 or measurement_2 column, not in both, the other column will be NaN. In some rows both columns will be NaN.

I want to add a column for the measurement type (depending on which column has the value) and take the actual value out of both columns, and remove the rows that have NaN in both columns.

Is there an easy way of doing this?

Thanks!

like image 924
Agustin Avatar asked Jul 28 '20 10:07

Agustin


People also ask

How do I merge two numerical columns in pandas?

By use + operator simply you can concatenate two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.

Does Panda read NaN na?

This is what Pandas documentation gives: na_values : scalar, str, list-like, or dict, optional Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: '', '#N/A', '#N/A N/A', '#NA', '-1.


3 Answers

Maybe combine_first could help?

import numpy as np


df["measurement"] = df["measurement_1"].combine_first(df["measurement_2"])
df["measurement_type"] = np.where(df["measurement_1"].notnull(), 1, 2)
df.drop(["measurement_1", "measurement_2"], 1)

    ID  measurement measurement_type
0   0   3           1
1   1   5           2
2   2   7           2

like image 168
help-ukraine-now Avatar answered Oct 14 '22 09:10

help-ukraine-now


You could use pandas melt :

(
    df.melt("ID", var_name="measurement_type", value_name="measurement")
    .dropna()
    .assign(measurement_type=lambda x: x.measurement_type.str[-1])
    .iloc[:, [0, -1, 1]]
    .astype("int8")
)

or wide to long :

(
    pd.wide_to_long(df, stubnames="measurement", i="ID", 
                    j="measurement_type", sep="_")
    .dropna()
    .reset_index()
    .astype("int8")
    .iloc[:, [0, -1, 1]]
)



    ID  measurement measurement_type
0   0          3        1
1   1          5        2
2   2          7        2
like image 21
sammywemmy Avatar answered Oct 14 '22 07:10

sammywemmy


Set a threshold and drop any that has more than one NaN. Use df.assign to fillna() measurement_1 and apply np.where on measurement_2

  df= df.dropna(thresh=2).assign(measurement=df.measurement_1.fillna\
                             (df.measurement_2), measurement_type=np.where(df.measurement_2.isna(),1,2)).drop(columns=['measurement_1','measurement_2'])

    ID  measurement  measurement_type
0   0              3              1
1   1              5              2
2   2              7              2
like image 43
wwnde Avatar answered Oct 14 '22 07:10

wwnde