I know how to drop a row from a DataFrame containing all nulls OR a single null but can you drop a row based on the nulls for a specified set of columns?
For example, say I am working with data containing geographical info (city, latitude, and longitude) in addition to numerous other fields. I want to keep the rows that at a minimum contain a value for city OR for lat and long but drop rows that have null values for all three.
I am having trouble finding functionality for this in pandas documentation. Any guidance would be appreciated.
In order to drop a null values from a dataframe, we used dropna() function this function drop Rows/Columns of datasets with Null values in different ways. Parameters: axis: axis takes int or string value for rows/columns. Input can be 0 or 1 for Integer and 'index' or 'columns' for String.
Use pandas. DataFrame. drop() method to delete/remove rows with condition(s).
Pandas DataFrame dropna() FunctionIf 0, drop rows with null values. If 1, drop columns with missing values. how: possible values are {'any', 'all'}, default 'any'. If 'any', drop the row/column if any of the values is null.
You can use pd.dropna but instead of using how='all' and subset=[], you can use the thresh parameter to require a minimum number of NAs in a row before a row gets dropped. In the city, long/lat example, a thresh=2 will work because we only drop in case of 3 NAs. Using the great data example set up by MaxU, we would do
## get the data
df = pd.read_clipboard()
## remove undesired rows
df.dropna(axis=0, subset=[['city', 'longitude', 'latitude']], thresh=2) 
This yields:
In [5]: df.dropna(axis=0, subset=[['city', 'longitude', 'latitude']], thresh=2)
Out[5]:
  city  latitude  longitude  a  b
0  aaa   11.1111        NaN  1  2
1  bbb       NaN    22.2222  5  6
3  NaN   11.1111    33.3330  1  2
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With