Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selecting multiple dataframe columns by position in pandas [duplicate]

I have a (large) dataframe. How can I select specific columns by position? e.g. columns 1..3, 5, 6

Rather than just drop column4, I am trying to do it in this way because there are a ton of rows in my dataset and I want to select by position:

 df=df[df.columns[0:2,4:5]]

but that gives IndexError: too many indices for array

DF input

 Col1     Col2     Col3       Col4        Col5       Col6
 1        apple    tomato     pear        banana     banana
 1        apple    grape      nan         banana     banana
 1        apple    nan        banana      banana     banana
 1        apple    tomato     banana      banana     banana
 1        apple    tomato     banana      banana     banana
 1        apple    tomato     banana      banana     banana
 1        avacado  tomato     banana      banana     banana
 1        toast    tomato     banana      banana     banana
 1        grape    tomato     egg         banana     banana

DF output - desired

 Col1     Col2     Col3       Col5       Col6
 1        apple    tomato     banana     banana
 1        apple    grape      banana     banana
 1        apple    nan        banana     banana
 1        apple    tomato     banana     banana
 1        apple    tomato     banana     banana
 1        apple    tomato     banana     banana     
 1        avacado  tomato     banana     banana     
 1        toast    tomato     banana     banana     
 1        grape    tomato     banana     banana
like image 361
aiden rosenblatt Avatar asked Nov 28 '22 00:11

aiden rosenblatt


2 Answers

What you need is numpy np.r_

df.iloc[:,np.r_[0:2,4:5]]
Out[265]: 
   Col1     Col2    Col5
0     1    apple  banana
1     1    apple  banana
2     1    apple  banana
3     1    apple  banana
4     1    apple  banana
5     1    apple  banana
6     1  avacado  banana
7     1    toast  banana
8     1    grape  banana
like image 96
BENY Avatar answered Dec 04 '22 08:12

BENY


You can select columns 0, 1, 4 in this way:

df.iloc[:, [0, 1, 4]]

You can read more about this in Indexing and Selecting Data.

• iloc is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array. .iloc will raise IndexError if a requested indexer is out-of-bounds, except slice indexers which allow out-of-bounds indexing. (this conforms with python/numpy slice semantics). Allowed inputs are:

◦ An integer e.g. 5

◦ A list or array of integers [4, 3, 0]

◦ A slice object with ints 1:7

◦ A boolean array

◦ A callable function with one argument (the calling Series, DataFrame or Panel) and that returns valid output for indexing (one of the above)

like image 33
jpp Avatar answered Dec 04 '22 09:12

jpp