I need to remove all rows in which elements from column 3 onwards are all NaN
df = DataFrame(np.random.randn(6, 5), index=['a', 'c', 'e', 'f', 'g','h'], columns=['one', 'two', 'three', 'four', 'five'])  df2 = df.reindex(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']) df2.ix[1][0] = 111 df2.ix[1][1] = 222   In the example above, my final data frame would not be having rows 'b' and 'c'.
How to use df.dropna() in this case?
Drop all rows having at least one null valueDataFrame. dropna() method is your friend. When you call dropna() over the whole DataFrame without specifying any arguments (i.e. using the default behaviour) then the method will drop all rows with at least one missing value.
You can call dropna with arguments subset and how:
df2.dropna(subset=['three', 'four', 'five'], how='all')   As the names suggests:
how='all' requires every column (of subset) in the row to be NaN in order to be dropped, as opposed to the default 'any'.subset is those columns to inspect for NaNs.As @PaulH points out, we can generalise to drop the last k columns with:
subset=df2.columns[k:]   Indeed, we could even do something more complicated if desired:
subset=filter(lambda x: len(x) > 3, df2.columns) 
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With