I have a DataFrame which looks like this: <pre class="prettyprint"><code> 1125400 5430095 1095751 2013-05-22 105.24 NaN 6507.58 2013-05-23 104.63 NaN 6393.86 2013-05-26 104.62 NaN 6521.54 2013-05-27 104.62 NaN 6609.31 2013-05-28 104.54 87.79 6640.24 2013-05-29 103.91 86.88 6577.39 2013-05-30 103.43 87.66 6516.55 2013-06-02 103.56 87.55 6559.43 </code></pre> I would like to compute the first non-NaN value in each column. As Locate first and last non NaN values in a Pandas DataFrame points out, first_valid_index can be used. Unfortunately, it returns the first row where at least one element is not NaN and does not work per-column.

You should use the apply function which applies a function on either each column (default) or each row efficiently: <pre class="prettyprint"><code>>>> first_valid_indices = df.apply(lambda series: series.first_valid_index()) >>> first_valid_indices 1125400 2013-05-22 00:00:00 5430095 2013-05-28 00:00:00 1095751 2013-05-22 00:00:00 </code></pre> <code>first_valid_indices</code>will then be a series containing the first_valid_index for each column. You could also define the <code>lambda</code> function as a normal function outside: <pre class="prettyprint"><code>def first_valid_index(series): return series.first_valid_index() </code></pre> and then call apply like this: <pre class="prettyprint"><code>df.apply(first_valid_index) </code></pre>

Computing the first non-missing value from each column in a DataFrame [duplicate]

Tags:

python

pandas

I have a DataFrame which looks like this:

            1125400  5430095  1095751
2013-05-22   105.24      NaN  6507.58
2013-05-23   104.63      NaN  6393.86
2013-05-26   104.62      NaN  6521.54
2013-05-27   104.62      NaN  6609.31
2013-05-28   104.54    87.79  6640.24
2013-05-29   103.91    86.88  6577.39
2013-05-30   103.43    87.66  6516.55
2013-06-02   103.56    87.55  6559.43

I would like to compute the first non-NaN value in each column.

As Locate first and last non NaN values in a Pandas DataFrame points out, first_valid_index can be used. Unfortunately, it returns the first row where at least one element is not NaN and does not work per-column.

353

asked Apr 26 '14 10:04

yevgeny.bezman

2 Answers

You should use the apply function which applies a function on either each column (default) or each row efficiently:

>>> first_valid_indices = df.apply(lambda series: series.first_valid_index())
>>> first_valid_indices
1125400   2013-05-22 00:00:00
5430095   2013-05-28 00:00:00
1095751   2013-05-22 00:00:00

first_valid_indiceswill then be a series containing the first_valid_index for each column.

You could also define the lambda function as a normal function outside:

def first_valid_index(series):
    return series.first_valid_index()

and then call apply like this:

df.apply(first_valid_index)

161

answered Nov 10 '22 13:11

Felix Zumstein

The built in function DataFrame.groupby().column.first() returns the first non null value in the column, while last() returns the last.

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.first.html

If you don't wish to get the first value for each group, you can add a dummy column of 1s. Then get the first non null value using the groupby & first functions.

from Pandas import DataFrame

df = DataFrame({'a':[None,1,None],'b':[None,2,None]})
df['dummy'] = 1
df.groupby('dummy').first()
df.groupby('dummy').last()

answered Nov 10 '22 12:11

Johnny V

Related questions
                            
                                Pip freeze gives me this error related with git
                            
                                how to find most frequent string element in numpy ndarray?
                            
                                Clustering / Grouping a list based on time (python)
                            
                                Split string by delimiter only if not wrapped in certain pattern
                            
                                Can I arrange 3 equally sized subplots in a triangular shape?
                            
                                Verify correct use of "a" and "an" in English texts - Python [closed]
                            
                                How to inflate a partial zlib file
                            
                                Non Brute Force Solution to Project Euler 25
                            
                                Splitting string and removing whitespace Python
                            
                                Set a value deep in a dict dynamically
                            
                                Sort python list of dictionaries by key if key exists
                            
                                Probing/sampling/interpolating VTK data using python TVTK or MayaVi
                            
                                opening & closing file without file object in python
                            
                                Is there a way to sort a list in python until the first sorted k elements are found?
                            
                                How can I make the level name be lower-case?
                            
                                Using numpy to write an array to stdout
                            
                                Sorting a list of string in python such that a specific string, if present appears first
                            
                                wrapping class method in try / except using decorator
                            
                                Boxplot stratified by column in python pandas
                            
                                How to bind volumes in docker-py?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With