I need to remove all rows in which elements from column 3 onwards are all NaN
df = DataFrame(np.random.randn(6, 5), index=['a', 'c', 'e', 'f', 'g','h'], columns=['one', 'two', 'three', 'four', 'five']) df2 = df.reindex(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']) df2.ix[1][0] = 111 df2.ix[1][1] = 222
In the example above, my final data frame would not be having rows 'b' and 'c'.
How to use df.dropna()
in this case?
Drop all rows having at least one null valueDataFrame. dropna() method is your friend. When you call dropna() over the whole DataFrame without specifying any arguments (i.e. using the default behaviour) then the method will drop all rows with at least one missing value.
You can call dropna
with arguments subset
and how
:
df2.dropna(subset=['three', 'four', 'five'], how='all')
As the names suggests:
how='all'
requires every column (of subset
) in the row to be NaN
in order to be dropped, as opposed to the default 'any'
.subset
is those columns to inspect for NaN
s.As @PaulH points out, we can generalise to drop the last k
columns with:
subset=df2.columns[k:]
Indeed, we could even do something more complicated if desired:
subset=filter(lambda x: len(x) > 3, df2.columns)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With