I have a DataFrame which looks like this:
1125400 5430095 1095751
2013-05-22 105.24 NaN 6507.58
2013-05-23 104.63 NaN 6393.86
2013-05-26 104.62 NaN 6521.54
2013-05-27 104.62 NaN 6609.31
2013-05-28 104.54 87.79 6640.24
2013-05-29 103.91 86.88 6577.39
2013-05-30 103.43 87.66 6516.55
2013-06-02 103.56 87.55 6559.43
I would like to compute the first non-NaN value in each column.
As Locate first and last non NaN values in a Pandas DataFrame points out, first_valid_index can be used. Unfortunately, it returns the first row where at least one element is not NaN and does not work per-column.
You can do this by passing “ascending=False” paramter in sort_values (). The above give you the count of missing values in each column. To get % of missing values in each column you can divide by length of the data frame. You can “len (df)” which gives you the number of rows in the data frame.
These missing values can impact the model in which the data is being feed. Reasons for missing values can be random, intentionally and by mistake well. Data Unavailability at the time of creation of DataFrame. In Pandas missing values are denoted by NaN and None both. In Pandas missing values are represented by NaN.
For the column “StartTime” the values are not missing because total number of rows are 205 in DataFrame and there are 205 values in StartTime column when we applied count ( ) function on columns. But For the columns like “FuelEconomy” and “Comments” there are some values which are missing. Step 3.
Get number of non missing values of single column in pandas python. count row wise non missing value using count () function. count of non missing values of a specific column. groupby count of non missing values of a column First let’s create a dataframe. view source print?
You should use the apply function which applies a function on either each column (default) or each row efficiently:
>>> first_valid_indices = df.apply(lambda series: series.first_valid_index())
>>> first_valid_indices
1125400 2013-05-22 00:00:00
5430095 2013-05-28 00:00:00
1095751 2013-05-22 00:00:00
first_valid_indices
will then be a series containing the first_valid_index for each column.
You could also define the lambda
function as a normal function outside:
def first_valid_index(series):
return series.first_valid_index()
and then call apply like this:
df.apply(first_valid_index)
The built in function DataFrame.groupby().column.first() returns the first non null value in the column, while last() returns the last.
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.first.html
If you don't wish to get the first value for each group, you can add a dummy column of 1s. Then get the first non null value using the groupby & first functions.
from Pandas import DataFrame
df = DataFrame({'a':[None,1,None],'b':[None,2,None]})
df['dummy'] = 1
df.groupby('dummy').first()
df.groupby('dummy').last()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With