Delete row based on nulls in certain columns (pandas)

Tags:

pandas

I know how to drop a row from a DataFrame containing all nulls OR a single null but can you drop a row based on the nulls for a specified set of columns?

For example, say I am working with data containing geographical info (city, latitude, and longitude) in addition to numerous other fields. I want to keep the rows that at a minimum contain a value for city OR for lat and long but drop rows that have null values for all three.

I am having trouble finding functionality for this in pandas documentation. Any guidance would be appreciated.

560

asked Feb 08 '17 22:02

gesingle

1 Answers

You can use pd.dropna but instead of using how='all' and subset=[], you can use the thresh parameter to require a minimum number of NAs in a row before a row gets dropped. In the city, long/lat example, a thresh=2 will work because we only drop in case of 3 NAs. Using the great data example set up by MaxU, we would do

## get the data
df = pd.read_clipboard()

## remove undesired rows
df.dropna(axis=0, subset=[['city', 'longitude', 'latitude']], thresh=2)

This yields:

In [5]: df.dropna(axis=0, subset=[['city', 'longitude', 'latitude']], thresh=2)
Out[5]:
  city  latitude  longitude  a  b
0  aaa   11.1111        NaN  1  2
1  bbb       NaN    22.2222  5  6
3  NaN   11.1111    33.3330  1  2

106

answered Nov 03 '22 15:11

Gene Burinsky

Related questions
                            
                                Hide histogram plot
                            
                                Reading the JSON File with multiple objects in Python
                            
                                np.argsort which excludes zero values
                            
                                How to directly add file to zip in python?
                            
                                What is an arbitrary element in Python?
                            
                                Pretty Print JSON [duplicate]
                            
                                How to calculate diff between two dates in django
                            
                                Pandas Sqlite query using variable
                            
                                Querying json object in dataframe using Pyspark
                            
                                Python 3: setup.py: pip install that does everything (build_ext + install)
                            
                                Class-based views: where to check for permissions?
                            
                                How can I play a mp4 movie using Moviepy and Pygame
                            
                                Error when checking model input: expected convolution2d_input_1 to have shape (None, 3, 32, 32) but got array with shape (50000, 32, 32, 3)
                            
                                In pandas, how do I flatten a group of rows
                            
                                Flask raises 404 for blueprint static files when using blueprint static route
                            
                                How to use Parameters in Python Luigi
                            
                                Export decorator that manages __all__
                            
                                Insert item into case-insensitive sorted list in Python
                            
                                How to install python3-dev in Oracle Linux?
                            
                                How to add caffe to anaconda on windows?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With