I have a dataframe with this format:
ID measurement_1 measurement_2
0 3 NaN
1 NaN 5
2 NaN 7
3 NaN NaN
I want to combine to:
ID measurement measurement_type
0 3 1
1 5 2
2 7 2
For each row there will be a value in either measurement_1 or measurement_2 column, not in both, the other column will be NaN.
In some rows both columns will be NaN.
I want to add a column for the measurement type (depending on which column has the value) and take the actual value out of both columns, and remove the rows that have NaN in both columns.
Is there an easy way of doing this?
Thanks!
By use + operator simply you can concatenate two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.
This is what Pandas documentation gives: na_values : scalar, str, list-like, or dict, optional Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: '', '#N/A', '#N/A N/A', '#NA', '-1.
Maybe combine_first could help?
import numpy as np
df["measurement"] = df["measurement_1"].combine_first(df["measurement_2"])
df["measurement_type"] = np.where(df["measurement_1"].notnull(), 1, 2)
df.drop(["measurement_1", "measurement_2"], 1)
ID measurement measurement_type
0 0 3 1
1 1 5 2
2 2 7 2
You could use pandas melt :
(
df.melt("ID", var_name="measurement_type", value_name="measurement")
.dropna()
.assign(measurement_type=lambda x: x.measurement_type.str[-1])
.iloc[:, [0, -1, 1]]
.astype("int8")
)
or wide to long :
(
pd.wide_to_long(df, stubnames="measurement", i="ID",
j="measurement_type", sep="_")
.dropna()
.reset_index()
.astype("int8")
.iloc[:, [0, -1, 1]]
)
ID measurement measurement_type
0 0 3 1
1 1 5 2
2 2 7 2
Set a threshold and drop any that has more than one NaN. Use df.assign to fillna() measurement_1 and apply np.where on measurement_2
df= df.dropna(thresh=2).assign(measurement=df.measurement_1.fillna\
(df.measurement_2), measurement_type=np.where(df.measurement_2.isna(),1,2)).drop(columns=['measurement_1','measurement_2'])
ID measurement measurement_type
0 0 3 1
1 1 5 2
2 2 7 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With