How do I get a summary count of missing/NaN data by column in 'pandas'?

Tags:

In R I can quickly see a count of missing data using the summary command, but the equivalent pandas DataFrame method, describe does not report these values.

I gather I can do something like

len(mydata.index) - mydata.count()

to compute the number of missing values for each column, but I wonder if there's a better idiom (or if my approach is even right).

692

asked Mar 07 '14 18:03

orome

2 Answers

Both describe and info report the count of non-missing values.

In [1]: df = DataFrame(np.random.randn(10,2))  In [2]: df.iloc[3:6,0] = np.nan  In [3]: df Out[3]:            0         1 0 -0.560342  1.862640 1 -1.237742  0.596384 2  0.603539 -1.561594 3       NaN  3.018954 4       NaN -0.046759 5       NaN  0.480158 6  0.113200 -0.911159 7  0.990895  0.612990 8  0.668534 -0.701769 9 -0.607247 -0.489427  [10 rows x 2 columns]  In [4]: df.describe() Out[4]:                0          1 count  7.000000  10.000000 mean  -0.004166   0.286042 std    0.818586   1.363422 min   -1.237742  -1.561594 25%   -0.583795  -0.648684 50%    0.113200   0.216699 75%    0.636036   0.608839 max    0.990895   3.018954  [8 rows x 2 columns]   In [5]: df.info() <class 'pandas.core.frame.DataFrame'> Int64Index: 10 entries, 0 to 9 Data columns (total 2 columns): 0    7 non-null float64 1    10 non-null float64 dtypes: float64(2)

To get a count of missing, your soln is correct

In [20]: len(df.index)-df.count() Out[20]:  0    3 1    0 dtype: int64

You could do this too

In [23]: df.isnull().sum() Out[23]:  0    3 1    0 dtype: int64

109

answered Oct 07 '22 12:10

Jeff

As a tiny addition, to get percentage missing by DataFrame column, combining @Jeff and @userS's answers above gets you:

df.isnull().sum()/len(df)*100

answered Oct 07 '22 13:10

Ricky McMaster

Related questions
                            
                                pandas dataframe multiply with a series [duplicate]
                            
                                How to get pandas.DataFrame columns containing specific dtype
                            
                                Seaborn multiple barplots
                            
                                Use None instead of np.nan for null values in pandas DataFrame
                            
                                Pandas how to use pd.cut()
                            
                                iterate over pandas dataframe using itertuples
                            
                                How to unpack a Series of tuples in Pandas?
                            
                                How to write Pandas dataframe to sqlite with Index
                            
                                How can I check if a Pandas dataframe's index is sorted
                            
                                How can I replicate rows in Pandas?
                            
                                python pandas pivot_table count frequency in one column
                            
                                Scikit-Learn's Pipeline: A sparse matrix was passed, but dense data is required
                            
                                Convert DataFrameGroupBy object to DataFrame pandas
                            
                                Can You Consistently Keep Track of Column Labels Using Sklearn's Transformer API?
                            
                                Faster way to read Excel files to pandas dataframe
                            
                                Is there an "ungroup by" operation opposite to .groupby in pandas?
                            
                                Summing over a multiindex level in a pandas series
                            
                                How to properly add hours to a pandas.tseries.index.DatetimeIndex?
                            
                                Pandas compare next row
                            
                                Check if all values in dataframe column are the same

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I get a summary count of missing/NaN data by column in 'pandas'?

Tags:

pandas

nan

missing-data

reporting

orome

People also ask

2 Answers

Jeff

Ricky McMaster

Recent Activity

Donate For Us