I have a dataframe with this format: <pre class="prettyprint"><code>ID measurement_1 measurement_2 0 3 NaN 1 NaN 5 2 NaN 7 3 NaN NaN </code></pre> I want to combine to: <pre class="prettyprint"><code>ID measurement measurement_type 0 3 1 1 5 2 2 7 2 </code></pre> For each row there will be a value in either <code>measurement_1</code> or <code>measurement_2</code> column, not in both, the other column will be NaN. In some rows both columns will be NaN. I want to add a column for the measurement type (depending on which column has the value) and take the actual value out of both columns, and remove the rows that have NaN in both columns. Is there an easy way of doing this? Thanks!

You could use pandas melt : <pre class="prettyprint"><code>( df.melt("ID", var_name="measurement_type", value_name="measurement") .dropna() .assign(measurement_type=lambda x: x.measurement_type.str[-1]) .iloc[:, [0, -1, 1]] .astype("int8") ) </code></pre> or wide to long : <pre class="prettyprint"><code>( pd.wide_to_long(df, stubnames="measurement", i="ID", j="measurement_type", sep="_") .dropna() .reset_index() .astype("int8") .iloc[:, [0, -1, 1]] ) ID measurement measurement_type 0 0 3 1 1 1 5 2 2 2 7 2 </code></pre>

Set a threshold and drop any that has more than one <code>NaN</code>. Use <code>df.assign</code> to <code>fillna()</code> measurement_1 and apply <code>np.where</code> on measurement_2 <pre class="prettyprint"><code> df= df.dropna(thresh=2).assign(measurement=df.measurement_1.fillna\ (df.measurement_2), measurement_type=np.where(df.measurement_2.isna(),1,2)).drop(columns=['measurement_1','measurement_2']) ID measurement measurement_type 0 0 3 1 1 1 5 2 2 2 7 2 </code></pre>

How to combine numeric columns in pandas dataframe with NaN?

Tags:

python

pandas

dataframe

I have a dataframe with this format:

ID measurement_1 measurement_2
0      3              NaN
1      NaN            5
2      NaN            7 
3      NaN            NaN

I want to combine to:

ID measurement measurement_type
0      3              1
1      5              2
2      7              2

For each row there will be a value in either measurement_1 or measurement_2 column, not in both, the other column will be NaN. In some rows both columns will be NaN.

I want to add a column for the measurement type (depending on which column has the value) and take the actual value out of both columns, and remove the rows that have NaN in both columns.

Is there an easy way of doing this?

Thanks!

924

asked Jul 28 '20 10:07

Agustin

3 Answers

Maybe combine_first could help?

import numpy as np


df["measurement"] = df["measurement_1"].combine_first(df["measurement_2"])
df["measurement_type"] = np.where(df["measurement_1"].notnull(), 1, 2)
df.drop(["measurement_1", "measurement_2"], 1)

    ID  measurement measurement_type
0   0   3           1
1   1   5           2
2   2   7           2

168

answered Oct 14 '22 09:10

help-ukraine-now

You could use pandas melt :

(
    df.melt("ID", var_name="measurement_type", value_name="measurement")
    .dropna()
    .assign(measurement_type=lambda x: x.measurement_type.str[-1])
    .iloc[:, [0, -1, 1]]
    .astype("int8")
)

or wide to long :

(
    pd.wide_to_long(df, stubnames="measurement", i="ID", 
                    j="measurement_type", sep="_")
    .dropna()
    .reset_index()
    .astype("int8")
    .iloc[:, [0, -1, 1]]
)



    ID  measurement measurement_type
0   0          3        1
1   1          5        2
2   2          7        2

answered Oct 14 '22 07:10

sammywemmy

Set a threshold and drop any that has more than one NaN. Use df.assign to fillna() measurement_1 and apply np.where on measurement_2

  df= df.dropna(thresh=2).assign(measurement=df.measurement_1.fillna\
                             (df.measurement_2), measurement_type=np.where(df.measurement_2.isna(),1,2)).drop(columns=['measurement_1','measurement_2'])

    ID  measurement  measurement_type
0   0              3              1
1   1              5              2
2   2              7              2

answered Oct 14 '22 07:10

wwnde

Related questions
                            
                                3 Different issues with ttk treeviews in python
                            
                                Custom attributes for Flask WTForms
                            
                                Python List to PostgreSQL Array
                            
                                UnboundLocalError: local variable 'L' referenced before assignment Python [duplicate]
                            
                                What is the practical application of bool() in Python?
                            
                                TypeError at / __init__() takes exactly 1 argument (2 given)
                            
                                Python ValueError: No JSON object could be decoded
                            
                                How to add a background image into pygame?
                            
                                Get last three digits of an integer
                            
                                How do I do line continuation with a long regex? [duplicate]
                            
                                matplotlib - making labels for violin plots
                            
                                Can't run pip: UnicodeDecodeError
                            
                                How to merge pandas value_counts() to dataframe or use it to subset a dataframe
                            
                                How to assign member variables temporarily?
                            
                                pandas group by ALL functionality?
                            
                                How do you decode one-hot labels in Tensorflow?
                            
                                What is itertools.groupby() used for?
                            
                                Tabula extract tables by area coordinates
                            
                                python 3.5 -> 3.6 Tablib TypeError: cell() missing 1 required positional argument: 'column'
                            
                                Using OrdinalEncoder to transform categorical values

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With