I'm trying to get the first non null value from multiple pandas series in a dataframe. <pre class="prettyprint"><code>df = pd.DataFrame({'a':[2, np.nan, np.nan, np.nan], 'b':[np.nan, 5, np.nan, np.nan], 'c':[np.nan, 55, 13, 14], 'd':[np.nan, np.nan, np.nan, 4], 'e':[12, np.nan, np.nan, 22], }) a b c d e 0 2.0 NaN NaN NaN 12.0 1 NaN 5.0 55.0 NaN NaN 2 NaN NaN 13.0 NaN NaN 3 NaN NaN 14.0 4.0 22.0 </code></pre> in this <code>df</code> I want to create a new column <code>'f'</code>, and set it equal to <code>'a'</code> if a is not null, <code>'b'</code> if b is not null etc. down to e. I could do a bunch of np.where statements which is inefficient. <pre class="prettyprint"><code>df['f'] = np.where(df.a.notnull(), df.a, np.where(df.b.notnull(), df.b, etc.)) </code></pre> I looked into doing <code>df.a or df.b or df.c</code> etc. result should look like: <pre class="prettyprint"><code> a b c d e f 0 2.0 NaN NaN NaN 12.0 2 1 NaN 5.0 55.0 NaN NaN 5 2 NaN NaN 13.0 NaN NaN 13 3 NaN NaN 14.0 4.0 22.0 14 </code></pre>

One solution <pre class="prettyprint"><code>df.groupby(['f']*df.shape[1], axis=1).first() Out[385]: f 0 2.0 1 5.0 2 13.0 3 14.0 </code></pre> The orther <pre class="prettyprint"><code>df.bfill(1)['a'] Out[388]: 0 2.0 1 5.0 2 13.0 3 14.0 Name: a, dtype: float64 </code></pre>

taking the first non null in python

Tags:

python

pandas

dataframe

I'm trying to get the first non null value from multiple pandas series in a dataframe.

df = pd.DataFrame({'a':[2, np.nan, np.nan, np.nan],
              'b':[np.nan, 5, np.nan, np.nan],
              'c':[np.nan, 55, 13, 14],
              'd':[np.nan, np.nan, np.nan, 4],
              'e':[12, np.nan, np.nan, 22],
          })

     a    b     c    d     e
0  2.0  NaN   NaN  NaN  12.0
1  NaN  5.0  55.0  NaN   NaN
2  NaN  NaN  13.0  NaN   NaN
3  NaN  NaN  14.0  4.0  22.0

in this df I want to create a new column 'f', and set it equal to 'a' if a is not null, 'b' if b is not null etc. down to e.

I could do a bunch of np.where statements which is inefficient.

df['f'] = np.where(df.a.notnull(), df.a,
              np.where(df.b.notnull(), df.b,
                   etc.))

I looked into doing df.a or df.b or df.c etc.

result should look like:

     a    b     c    d     e   f
0  2.0  NaN   NaN  NaN  12.0   2
1  NaN  5.0  55.0  NaN   NaN   5
2  NaN  NaN  13.0  NaN   NaN  13
3  NaN  NaN  14.0  4.0  22.0  14

773

asked Aug 29 '18 15:08

Matt W.

1 Answers

One solution

df.groupby(['f']*df.shape[1], axis=1).first()
Out[385]: 
      f
0   2.0
1   5.0
2  13.0
3  14.0

The orther

df.bfill(1)['a']
Out[388]: 
0     2.0
1     5.0
2    13.0
3    14.0
Name: a, dtype: float64

answered Sep 20 '22 07:09

BENY

Related questions
                            
                                What does tqdm's total parameter do?
                            
                                Django and Folium integration
                            
                                How to pass additional parameters to handle_client coroutine?
                            
                                How to target data attribute with Scrapy
                            
                                Python3 __pycache__ generating even if PYTHONDONTWRITEBYTECODE=1
                            
                                Scipy sigmoid curve fitting
                            
                                Merge list into sparse list efficiently
                            
                                What is the difference between APIView class and generics.GenericAPIView
                            
                                Auto-build an Mkdocs documentation in Travis CI
                            
                                Django: annotate Sum Case When depending on the status of a field
                            
                                Python Difflib's SequenceMatcher does not find Longest Common Substrings
                            
                                How to ignore pymysql warnings?
                            
                                Pandas: Replace a string with 'other' if it is not present in a list of strings
                            
                                Pandas DataFrame as an Argument to a Function - Python
                            
                                'pip==9.0.1' distribution was not found and is required by the application
                            
                                subplots in matplotlib give ValueError: not enough values to unpack
                            
                                Python - greyscale image to 3 channels
                            
                                Django AttributeError 'datetime.date' object has no attribute 'utcoffset'
                            
                                Conditionally offseting values by group with Pandas
                            
                                Convert NetCDF (.nc) to GEOTIFF

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With