handling missing data in Pandas

Tags:

pandas

I have a 200,000 x 500 dataframe loaded into Pandas. Is there a function that can automatically tell me which columns are missing data? Or do I have to iterate over each column and check element by element?

Once I've found a missing element, how do I define a custom function (based on both the column name and some other data in the same row) to do automatic replacements. I see the fillna() method, but I don't think it takes a (lambda) function as an input.

Thanks!

346

asked Jul 23 '12 21:07

vgoklani

1 Answers

something like:

import pandas as pd
pd.isnull(frame).any()

Is probably what you're looking for to look for missing data

fillna currently does not take lambda functions though that's in the works as an open issue on github.

You can use DataFrame.apply to do custom filling for now. Though can you be a little more specific on what you need to do to fill the data? Just curious what the use case is.

answered Oct 02 '22 13:10

Chang She

Related questions
                            
                                Filter out troughs based on distance between peaks
                            
                                How to replace a word in dataframe by using another dataframe in Pandas python
                            
                                Python: Extract dimension data from dataframe string column and create columns with values for each of them
                            
                                How to use series.isin with different sets for different values?
                            
                                How to access multi-level index in pandas data frame?
                            
                                pandas: write dataframe to excel file *object* (not file)?
                            
                                Pandas is not getting installed in Ubuntu
                            
                                Replacing more than one substring value with pandas str.replace
                            
                                Counting Consecutive Duplicates For By Group
                            
                                How to separate files using dask groupby on a column
                            
                                How to write pandas dataframe into Databricks dbfs/FileStore?
                            
                                Reindexing only level of a MultiIndex dataframe, reindex() broken?
                            
                                Optimizing cartesian product between two Pandas Dataframe
                            
                                How to drop null values from dynamic loop generated from Python?
                            
                                How to multiply certain values of a column by a constant?
                            
                                arrays into pandas dataframe columns
                            
                                Change How Pandas Displays nan
                            
                                How to normalize and create similarity matrix in Pyspark?
                            
                                Set decimal precision of a pandas dataframe column with a datatype of Decimal
                            
                                How do I tell pandas to parse a particular column as a datetime object, but not make it an index?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With