Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selecting a subset using dropna() to select multiple columns

Tags:

python

pandas

I have the following DataFrame:

df = pd.DataFrame([[1,2,3,3],[10,20,2,],[10,2,5,],[1,3],[2]],columns = ['a','b','c','d'])

From this DataFrame, I want to drop the rows where all values in the subset ['b', 'c', 'd'] are NA, which means the last row should be dropped.

The following code works:

df.dropna(subset=['b', 'c', 'd'], how = 'all')

However, considering that I will be working with larger data frames, I would like to select the same subset using the range ['b':'d']. How do I select this subset?

like image 657
dvb9 Avatar asked Oct 21 '17 15:10

dvb9


People also ask

What is the use of Dropna () function?

Definition and Usage The dropna() method removes the rows that contains NULL values. The dropna() method returns a new DataFrame object unless the inplace parameter is set to True , in that case the dropna() method does the removing in the original DataFrame instead.

What is subset in Dropna?

subset: It's an array which limits the dropping process to passed rows/columns through list. inplace: It is a boolean which makes the changes in data frame itself if True.

How do you select multiple ranges in DF Loc?

Using df[] & loc[] to Select Multiple Columns by Name By using df[] & pandas. DataFrame. loc[] you can select multiple columns by names or labels. To select the columns by names, the syntax is df.


3 Answers

Similar to @ayhan's idea - using df.columns.slice_indexer:

In [25]: cols = df.columns[df.columns.slice_indexer('b','d')]

In [26]: cols
Out[26]: Index(['b', 'c', 'd'], dtype='object')

In [27]: df.dropna(subset=cols, how='all')
Out[27]:
    a     b    c    d
0   1   2.0  3.0  3.0
1  10  20.0  2.0  NaN
2  10   2.0  5.0  NaN
3   1   3.0  NaN  NaN
like image 24
MaxU - stop WAR against UA Avatar answered Oct 30 '22 04:10

MaxU - stop WAR against UA


You could also slice the column list numerically:

c = df.columns[1:4]  
df = df.dropna(subset=c, how='all')

If using numbers is impractical (i.e. too many to count), there is a somewhat cumbersome work-around:

start, stop = df.columns.get_loc('b'), df.columns.get_loc('d')
c = df.columns[start:stop+1]
df = df.dropna(subset=c, how='all')
like image 27
Turanga1 Avatar answered Oct 30 '22 04:10

Turanga1


IIUC, use loc, retrieve those columns, and pass that to dropna.

c = df.loc[0, 'b':'d'].columns  # retrieve only the 0th row for efficiency
df = df.dropna(subset=c, how='all')

print(df) 
    a     b    c    d
0   1   2.0  3.0  3.0
1  10  20.0  2.0  NaN
2  10   2.0  5.0  NaN
3   1   3.0  NaN  NaN
like image 148
cs95 Avatar answered Oct 30 '22 05:10

cs95