Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Index pandas DataFrame by column numbers, when column names are integers

Tags:

python

pandas

I am trying to keep just certain columns of a DataFrame, and it works fine when column names are strings:

In [2]: import numpy as np

In [3]: import pandas as pd

In [4]: a = np.arange(35).reshape(5,7)

In [5]: df = pd.DataFrame(a, ['x', 'y', 'u', 'z', 'w'], ['a', 'b', 'c', 'd', 'e', 'f', 'g'])

In [6]: df
Out[6]: 
    a   b   c   d   e   f   g
x   0   1   2   3   4   5   6
y   7   8   9  10  11  12  13
u  14  15  16  17  18  19  20
z  21  22  23  24  25  26  27
w  28  29  30  31  32  33  34

[5 rows x 7 columns]

In [7]: df[[1,3]] #No problem
Out[7]: 
    b   d
x   1   3
y   8  10
u  15  17
z  22  24
w  29  31

However, when column names are integers, I am getting a key error:

In [8]: df = pd.DataFrame(a, ['x', 'y', 'u', 'z', 'w'], range(10, 17))

In [9]: df
Out[9]: 
   10  11  12  13  14  15  16
x   0   1   2   3   4   5   6
y   7   8   9  10  11  12  13
u  14  15  16  17  18  19  20
z  21  22  23  24  25  26  27
w  28  29  30  31  32  33  34

[5 rows x 7 columns]

In [10]: df[[1,3]]

Results in:

KeyError: '[1 3] not in index'

I can see why pandas does not allow that -> to avoid mix up between indexing by column names and column numbers. However, is there a way to tell pandas that I want to index by column numbers? Of course, one solution is to convert column names to strings, but I am wondering if there is a better solution.

like image 623
Akavall Avatar asked Nov 26 '14 18:11

Akavall


People also ask

How do you find the index of a column in a data frame?

You can get the column index from the column name in Pandas using DataFrame. columns. get_loc() method.

How do you check if a column has numeric values in Pandas?

Pandas str. isdigit() method is used to check if all characters in each string in series are digits. Whitespace or any other character occurrence in the string would return false. If the number is in decimal, then also false will be returned since this is a string method and '.

How do you give the index a column name in Pandas?

You can use the rename() method of pandas. DataFrame to change column/index name individually. Specify the original name and the new name in dict like {original name: new name} to columns / index parameter of rename() . columns is for the column name, and index is for the index name.

What does Loc and ILOC do?

loc is label-based, which means that you have to specify rows and columns based on their row and column labels. iloc is integer position-based, so you have to specify rows and columns by their integer position values (0-based integer position).


Video Answer


3 Answers

This is exactly the purpose of iloc, see here

In [37]: df
Out[37]: 
   10  11  12  13  14  15  16
x   0   1   2   3   4   5   6
y   7   8   9  10  11  12  13
u  14  15  16  17  18  19  20
z  21  22  23  24  25  26  27
w  28  29  30  31  32  33  34

In [38]: df.iloc[:,[1,3]]
Out[38]: 
   11  13
x   1   3
y   8  10
u  15  17
z  22  24
w  29  31
like image 139
Jeff Avatar answered Oct 08 '22 12:10

Jeff


Just convert the headers from integer to string. This should be done almost always as a best practice when working with pandas datasets to avoid surprise

df.columns = df.columns.map(str)
like image 26
Anurag Agarwal Avatar answered Oct 08 '22 14:10

Anurag Agarwal


This is certainly one of those things that feels like a bug but is really a design decision (I think).

A few work around options:

rename the columns with their positions as their name:

 df.columns = arange(0,len(df.columns))

Another way is to get names from df.columns:

print df[ df.columns[[1,3]] ]
   11  13
x   1   3
y   8  10
u  15  17
z  22  24
w  29  31

I suspect this is the most appealing as it just requires adding a wee bit of code and not changing any column names.

like image 3
JD Long Avatar answered Oct 08 '22 14:10

JD Long