I am trying to keep just certain columns of a DataFrame, and it works fine when column names are strings: <pre class="prettyprint"><code>In [2]: import numpy as np In [3]: import pandas as pd In [4]: a = np.arange(35).reshape(5,7) In [5]: df = pd.DataFrame(a, ['x', 'y', 'u', 'z', 'w'], ['a', 'b', 'c', 'd', 'e', 'f', 'g']) In [6]: df Out[6]: a b c d e f g x 0 1 2 3 4 5 6 y 7 8 9 10 11 12 13 u 14 15 16 17 18 19 20 z 21 22 23 24 25 26 27 w 28 29 30 31 32 33 34 [5 rows x 7 columns] In [7]: df[[1,3]] #No problem Out[7]: b d x 1 3 y 8 10 u 15 17 z 22 24 w 29 31 </code></pre> However, when column names are integers, I am getting a key error: <pre class="prettyprint"><code>In [8]: df = pd.DataFrame(a, ['x', 'y', 'u', 'z', 'w'], range(10, 17)) In [9]: df Out[9]: 10 11 12 13 14 15 16 x 0 1 2 3 4 5 6 y 7 8 9 10 11 12 13 u 14 15 16 17 18 19 20 z 21 22 23 24 25 26 27 w 28 29 30 31 32 33 34 [5 rows x 7 columns] In [10]: df[[1,3]] </code></pre> Results in: <pre class="prettyprint"><code>KeyError: '[1 3] not in index' </code></pre> I can see why pandas does not allow that -> to avoid mix up between indexing by column names and column numbers. However, is there a way to tell pandas that I want to index by column numbers? Of course, one solution is to convert column names to strings, but I am wondering if there is a better solution.

This is certainly one of those things that feels like a bug but is really a design decision (I think). A few work around options: rename the columns with their positions as their name: <pre class="prettyprint"><code> df.columns = arange(0,len(df.columns)) </code></pre> Another way is to get names from <code>df.columns</code>: <pre class="prettyprint"><code>print df[ df.columns[[1,3]] ] 11 13 x 1 3 y 8 10 u 15 17 z 22 24 w 29 31 </code></pre> I suspect this is the most appealing as it just requires adding a wee bit of code and not changing any column names.

Index pandas DataFrame by column numbers, when column names are integers

I am trying to keep just certain columns of a DataFrame, and it works fine when column names are strings:

In [2]: import numpy as np

In [3]: import pandas as pd

In [4]: a = np.arange(35).reshape(5,7)

In [5]: df = pd.DataFrame(a, ['x', 'y', 'u', 'z', 'w'], ['a', 'b', 'c', 'd', 'e', 'f', 'g'])

In [6]: df
Out[6]: 
    a   b   c   d   e   f   g
x   0   1   2   3   4   5   6
y   7   8   9  10  11  12  13
u  14  15  16  17  18  19  20
z  21  22  23  24  25  26  27
w  28  29  30  31  32  33  34

[5 rows x 7 columns]

In [7]: df[[1,3]] #No problem
Out[7]: 
    b   d
x   1   3
y   8  10
u  15  17
z  22  24
w  29  31

However, when column names are integers, I am getting a key error:

In [8]: df = pd.DataFrame(a, ['x', 'y', 'u', 'z', 'w'], range(10, 17))

In [9]: df
Out[9]: 
   10  11  12  13  14  15  16
x   0   1   2   3   4   5   6
y   7   8   9  10  11  12  13
u  14  15  16  17  18  19  20
z  21  22  23  24  25  26  27
w  28  29  30  31  32  33  34

[5 rows x 7 columns]

In [10]: df[[1,3]]

Results in:

KeyError: '[1 3] not in index'

I can see why pandas does not allow that -> to avoid mix up between indexing by column names and column numbers. However, is there a way to tell pandas that I want to index by column numbers? Of course, one solution is to convert column names to strings, but I am wondering if there is a better solution.

How do you find the index of a column in a data frame?

You can get the column index from the column name in Pandas using DataFrame. columns. get_loc() method.

How do you check if a column has numeric values in Pandas?

Pandas str. isdigit() method is used to check if all characters in each string in series are digits. Whitespace or any other character occurrence in the string would return false. If the number is in decimal, then also false will be returned since this is a string method and '.

How do you give the index a column name in Pandas?

You can use the rename() method of pandas. DataFrame to change column/index name individually. Specify the original name and the new name in dict like {original name: new name} to columns / index parameter of rename() . columns is for the column name, and index is for the index name.

What does Loc and ILOC do?

loc is label-based, which means that you have to specify rows and columns based on their row and column labels. iloc is integer position-based, so you have to specify rows and columns by their integer position values (0-based integer position).

This is exactly the purpose of iloc, see here

In [37]: df
Out[37]: 
   10  11  12  13  14  15  16
x   0   1   2   3   4   5   6
y   7   8   9  10  11  12  13
u  14  15  16  17  18  19  20
z  21  22  23  24  25  26  27
w  28  29  30  31  32  33  34

In [38]: df.iloc[:,[1,3]]
Out[38]: 
   11  13
x   1   3
y   8  10
u  15  17
z  22  24
w  29  31

Just convert the headers from integer to string. This should be done almost always as a best practice when working with pandas datasets to avoid surprise

df.columns = df.columns.map(str)

This is certainly one of those things that feels like a bug but is really a design decision (I think).

A few work around options:

rename the columns with their positions as their name:

 df.columns = arange(0,len(df.columns))

Another way is to get names from df.columns:

print df[ df.columns[[1,3]] ]
   11  13
x   1   3
y   8  10
u  15  17
z  22  24
w  29  31

I suspect this is the most appealing as it just requires adding a wee bit of code and not changing any column names.

Index pandas DataFrame by column numbers, when column names are integers

Tags:

python

pandas

Akavall

People also ask

Video Answer

3 Answers

Jeff

Anurag Agarwal

JD Long

Recent Activity

Donate For Us

Index pandas DataFrame by column numbers, when column names are integers

Tags:

python

pandas

Akavall

People also ask

Video Answer

3 Answers

Jeff

Anurag Agarwal

JD Long

Related questions

Recent Activity

Donate For Us