Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove rows with null values from kth column onward in python

Tags:

python

pandas

I need to remove all rows in which elements from column 3 onwards are all NaN

df = DataFrame(np.random.randn(6, 5), index=['a', 'c', 'e', 'f', 'g','h'], columns=['one', 'two', 'three', 'four', 'five'])  df2 = df.reindex(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']) df2.ix[1][0] = 111 df2.ix[1][1] = 222 

In the example above, my final data frame would not be having rows 'b' and 'c'.

How to use df.dropna() in this case?

like image 240
user1140126 Avatar asked Feb 20 '13 22:02

user1140126


People also ask

How do you delete null rows in Python?

Drop all rows having at least one null valueDataFrame. dropna() method is your friend. When you call dropna() over the whole DataFrame without specifying any arguments (i.e. using the default behaviour) then the method will drop all rows with at least one missing value.


1 Answers

You can call dropna with arguments subset and how:

df2.dropna(subset=['three', 'four', 'five'], how='all') 

As the names suggests:

  • how='all' requires every column (of subset) in the row to be NaN in order to be dropped, as opposed to the default 'any'.
  • subset is those columns to inspect for NaNs.

As @PaulH points out, we can generalise to drop the last k columns with:

subset=df2.columns[k:] 

Indeed, we could even do something more complicated if desired:

subset=filter(lambda x: len(x) > 3, df2.columns) 
like image 181
Andy Hayden Avatar answered Oct 20 '22 15:10

Andy Hayden