Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete row based on nulls in certain columns (pandas)

Tags:

python

pandas

I know how to drop a row from a DataFrame containing all nulls OR a single null but can you drop a row based on the nulls for a specified set of columns?

For example, say I am working with data containing geographical info (city, latitude, and longitude) in addition to numerous other fields. I want to keep the rows that at a minimum contain a value for city OR for lat and long but drop rows that have null values for all three.

I am having trouble finding functionality for this in pandas documentation. Any guidance would be appreciated.

like image 560
gesingle Avatar asked Feb 08 '17 22:02

gesingle


People also ask

How do you drop records with nulls in any of the columns?

In order to drop a null values from a dataframe, we used dropna() function this function drop Rows/Columns of datasets with Null values in different ways. Parameters: axis: axis takes int or string value for rows/columns. Input can be 0 or 1 for Integer and 'index' or 'columns' for String.

How do I delete a row based on conditions in pandas?

Use pandas. DataFrame. drop() method to delete/remove rows with condition(s).

How do you drop rows which has null values in pandas?

Pandas DataFrame dropna() FunctionIf 0, drop rows with null values. If 1, drop columns with missing values. how: possible values are {'any', 'all'}, default 'any'. If 'any', drop the row/column if any of the values is null.


1 Answers

You can use pd.dropna but instead of using how='all' and subset=[], you can use the thresh parameter to require a minimum number of NAs in a row before a row gets dropped. In the city, long/lat example, a thresh=2 will work because we only drop in case of 3 NAs. Using the great data example set up by MaxU, we would do

## get the data
df = pd.read_clipboard()

## remove undesired rows
df.dropna(axis=0, subset=[['city', 'longitude', 'latitude']], thresh=2) 

This yields:

In [5]: df.dropna(axis=0, subset=[['city', 'longitude', 'latitude']], thresh=2)
Out[5]:
  city  latitude  longitude  a  b
0  aaa   11.1111        NaN  1  2
1  bbb       NaN    22.2222  5  6
3  NaN   11.1111    33.3330  1  2
like image 106
Gene Burinsky Avatar answered Nov 03 '22 15:11

Gene Burinsky