Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Computing the first non-missing value from each column in a DataFrame [duplicate]

Tags:

python

pandas

I have a DataFrame which looks like this:

            1125400  5430095  1095751
2013-05-22   105.24      NaN  6507.58
2013-05-23   104.63      NaN  6393.86
2013-05-26   104.62      NaN  6521.54
2013-05-27   104.62      NaN  6609.31
2013-05-28   104.54    87.79  6640.24
2013-05-29   103.91    86.88  6577.39
2013-05-30   103.43    87.66  6516.55
2013-06-02   103.56    87.55  6559.43

I would like to compute the first non-NaN value in each column.

As Locate first and last non NaN values in a Pandas DataFrame points out, first_valid_index can be used. Unfortunately, it returns the first row where at least one element is not NaN and does not work per-column.

like image 353
yevgeny.bezman Avatar asked Apr 26 '14 10:04

yevgeny.bezman


People also ask

How to count the number of missing values in a Dataframe?

You can do this by passing “ascending=False” paramter in sort_values (). The above give you the count of missing values in each column. To get % of missing values in each column you can divide by length of the data frame. You can “len (df)” which gives you the number of rows in the data frame.

What are missing values in pandas Dataframe?

These missing values can impact the model in which the data is being feed. Reasons for missing values can be random, intentionally and by mistake well. Data Unavailability at the time of creation of DataFrame. In Pandas missing values are denoted by NaN and None both. In Pandas missing values are represented by NaN.

How many values are missing from startTime column in Dataframe?

For the column “StartTime” the values are not missing because total number of rows are 205 in DataFrame and there are 205 values in StartTime column when we applied count ( ) function on columns. But For the columns like “FuelEconomy” and “Comments” there are some values which are missing. Step 3.

How to get number of non missing values of single column in Python?

Get number of non missing values of single column in pandas python. count row wise non missing value using count () function. count of non missing values of a specific column. groupby count of non missing values of a column First let’s create a dataframe. view source print?


2 Answers

You should use the apply function which applies a function on either each column (default) or each row efficiently:

>>> first_valid_indices = df.apply(lambda series: series.first_valid_index())
>>> first_valid_indices
1125400   2013-05-22 00:00:00
5430095   2013-05-28 00:00:00
1095751   2013-05-22 00:00:00

first_valid_indiceswill then be a series containing the first_valid_index for each column.

You could also define the lambda function as a normal function outside:

def first_valid_index(series):
    return series.first_valid_index()

and then call apply like this:

df.apply(first_valid_index)
like image 161
Felix Zumstein Avatar answered Nov 10 '22 13:11

Felix Zumstein


The built in function DataFrame.groupby().column.first() returns the first non null value in the column, while last() returns the last.

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.first.html

If you don't wish to get the first value for each group, you can add a dummy column of 1s. Then get the first non null value using the groupby & first functions.

from Pandas import DataFrame

df = DataFrame({'a':[None,1,None],'b':[None,2,None]})
df['dummy'] = 1
df.groupby('dummy').first()
df.groupby('dummy').last()
like image 32
Johnny V Avatar answered Nov 10 '22 12:11

Johnny V