I have a (large) dataframe. How can I select specific columns by position? e.g. columns 1..3, 5, 6
Rather than just drop column4, I am trying to do it in this way because there are a ton of rows in my dataset and I want to select by position:
df=df[df.columns[0:2,4:5]]
but that gives IndexError: too many indices for array
DF input
Col1 Col2 Col3 Col4 Col5 Col6
1 apple tomato pear banana banana
1 apple grape nan banana banana
1 apple nan banana banana banana
1 apple tomato banana banana banana
1 apple tomato banana banana banana
1 apple tomato banana banana banana
1 avacado tomato banana banana banana
1 toast tomato banana banana banana
1 grape tomato egg banana banana
DF output - desired
Col1 Col2 Col3 Col5 Col6
1 apple tomato banana banana
1 apple grape banana banana
1 apple nan banana banana
1 apple tomato banana banana
1 apple tomato banana banana
1 apple tomato banana banana
1 avacado tomato banana banana
1 toast tomato banana banana
1 grape tomato banana banana
What you need is numpy np.r_
df.iloc[:,np.r_[0:2,4:5]]
Out[265]:
Col1 Col2 Col5
0 1 apple banana
1 1 apple banana
2 1 apple banana
3 1 apple banana
4 1 apple banana
5 1 apple banana
6 1 avacado banana
7 1 toast banana
8 1 grape banana
You can select columns 0, 1, 4 in this way:
df.iloc[:, [0, 1, 4]]
You can read more about this in Indexing and Selecting Data.
• iloc is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array. .iloc will raise IndexError if a requested indexer is out-of-bounds, except slice indexers which allow out-of-bounds indexing. (this conforms with python/numpy slice semantics). Allowed inputs are:
◦ An integer e.g. 5
◦ A list or array of integers [4, 3, 0]
◦ A slice object with ints 1:7
◦ A boolean array
◦ A callable function with one argument (the calling Series, DataFrame or Panel) and that returns valid output for indexing (one of the above)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With