I have a dataframe with this format:
ID measurement_1 measurement_2
0 3 NaN
1 NaN 5
2 NaN 7
3 NaN NaN
I want to combine to:
ID measurement measurement_type
0 3 1
1 5 2
2 7 2
For each row there will be a value in either measurement_1
or measurement_2
column, not in both, the other column will be NaN.
In some rows both columns will be NaN.
I want to add a column for the measurement type (depending on which column has the value) and take the actual value out of both columns, and remove the rows that have NaN in both columns.
Is there an easy way of doing this?
Thanks!
By use + operator simply you can concatenate two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.
This is what Pandas documentation gives: na_values : scalar, str, list-like, or dict, optional Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: '', '#N/A', '#N/A N/A', '#NA', '-1.
Maybe combine_first
could help?
import numpy as np
df["measurement"] = df["measurement_1"].combine_first(df["measurement_2"])
df["measurement_type"] = np.where(df["measurement_1"].notnull(), 1, 2)
df.drop(["measurement_1", "measurement_2"], 1)
ID measurement measurement_type
0 0 3 1
1 1 5 2
2 2 7 2
You could use pandas melt :
(
df.melt("ID", var_name="measurement_type", value_name="measurement")
.dropna()
.assign(measurement_type=lambda x: x.measurement_type.str[-1])
.iloc[:, [0, -1, 1]]
.astype("int8")
)
or wide to long :
(
pd.wide_to_long(df, stubnames="measurement", i="ID",
j="measurement_type", sep="_")
.dropna()
.reset_index()
.astype("int8")
.iloc[:, [0, -1, 1]]
)
ID measurement measurement_type
0 0 3 1
1 1 5 2
2 2 7 2
Set a threshold and drop any that has more than one NaN
. Use df.assign
to fillna()
measurement_1 and apply np.where
on measurement_2
df= df.dropna(thresh=2).assign(measurement=df.measurement_1.fillna\
(df.measurement_2), measurement_type=np.where(df.measurement_2.isna(),1,2)).drop(columns=['measurement_1','measurement_2'])
ID measurement measurement_type
0 0 3 1
1 1 5 2
2 2 7 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With