I know how to drop a row from a DataFrame containing all nulls OR a single null but can you drop a row based on the nulls for a specified set of columns?
For example, say I am working with data containing geographical info (city, latitude, and longitude) in addition to numerous other fields. I want to keep the rows that at a minimum contain a value for city OR for lat and long but drop rows that have null values for all three.
I am having trouble finding functionality for this in pandas documentation. Any guidance would be appreciated.
In order to drop a null values from a dataframe, we used dropna() function this function drop Rows/Columns of datasets with Null values in different ways. Parameters: axis: axis takes int or string value for rows/columns. Input can be 0 or 1 for Integer and 'index' or 'columns' for String.
Use pandas. DataFrame. drop() method to delete/remove rows with condition(s).
Pandas DataFrame dropna() FunctionIf 0, drop rows with null values. If 1, drop columns with missing values. how: possible values are {'any', 'all'}, default 'any'. If 'any', drop the row/column if any of the values is null.
You can use pd.dropna
but instead of using how='all'
and subset=[]
, you can use the thresh
parameter to require a minimum number of NAs in a row before a row gets dropped. In the city, long/lat example, a thresh=2
will work because we only drop in case of 3 NAs. Using the great data example set up by MaxU, we would do
## get the data
df = pd.read_clipboard()
## remove undesired rows
df.dropna(axis=0, subset=[['city', 'longitude', 'latitude']], thresh=2)
This yields:
In [5]: df.dropna(axis=0, subset=[['city', 'longitude', 'latitude']], thresh=2)
Out[5]:
city latitude longitude a b
0 aaa 11.1111 NaN 1 2
1 bbb NaN 22.2222 5 6
3 NaN 11.1111 33.3330 1 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With