I am trying to keep just certain columns of a DataFrame, and it works fine when column names are strings:
In [2]: import numpy as np
In [3]: import pandas as pd
In [4]: a = np.arange(35).reshape(5,7)
In [5]: df = pd.DataFrame(a, ['x', 'y', 'u', 'z', 'w'], ['a', 'b', 'c', 'd', 'e', 'f', 'g'])
In [6]: df
Out[6]:
a b c d e f g
x 0 1 2 3 4 5 6
y 7 8 9 10 11 12 13
u 14 15 16 17 18 19 20
z 21 22 23 24 25 26 27
w 28 29 30 31 32 33 34
[5 rows x 7 columns]
In [7]: df[[1,3]] #No problem
Out[7]:
b d
x 1 3
y 8 10
u 15 17
z 22 24
w 29 31
However, when column names are integers, I am getting a key error:
In [8]: df = pd.DataFrame(a, ['x', 'y', 'u', 'z', 'w'], range(10, 17))
In [9]: df
Out[9]:
10 11 12 13 14 15 16
x 0 1 2 3 4 5 6
y 7 8 9 10 11 12 13
u 14 15 16 17 18 19 20
z 21 22 23 24 25 26 27
w 28 29 30 31 32 33 34
[5 rows x 7 columns]
In [10]: df[[1,3]]
Results in:
KeyError: '[1 3] not in index'
I can see why pandas does not allow that -> to avoid mix up between indexing by column names and column numbers. However, is there a way to tell pandas that I want to index by column numbers? Of course, one solution is to convert column names to strings, but I am wondering if there is a better solution.
You can get the column index from the column name in Pandas using DataFrame. columns. get_loc() method.
Pandas str. isdigit() method is used to check if all characters in each string in series are digits. Whitespace or any other character occurrence in the string would return false. If the number is in decimal, then also false will be returned since this is a string method and '.
You can use the rename() method of pandas. DataFrame to change column/index name individually. Specify the original name and the new name in dict like {original name: new name} to columns / index parameter of rename() . columns is for the column name, and index is for the index name.
loc is label-based, which means that you have to specify rows and columns based on their row and column labels. iloc is integer position-based, so you have to specify rows and columns by their integer position values (0-based integer position).
This is exactly the purpose of iloc, see here
In [37]: df
Out[37]:
10 11 12 13 14 15 16
x 0 1 2 3 4 5 6
y 7 8 9 10 11 12 13
u 14 15 16 17 18 19 20
z 21 22 23 24 25 26 27
w 28 29 30 31 32 33 34
In [38]: df.iloc[:,[1,3]]
Out[38]:
11 13
x 1 3
y 8 10
u 15 17
z 22 24
w 29 31
Just convert the headers from integer to string. This should be done almost always as a best practice when working with pandas datasets to avoid surprise
df.columns = df.columns.map(str)
This is certainly one of those things that feels like a bug but is really a design decision (I think).
A few work around options:
rename the columns with their positions as their name:
df.columns = arange(0,len(df.columns))
Another way is to get names from df.columns
:
print df[ df.columns[[1,3]] ]
11 13
x 1 3
y 8 10
u 15 17
z 22 24
w 29 31
I suspect this is the most appealing as it just requires adding a wee bit of code and not changing any column names.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With