Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas read_csv parse header as string type but i want integer

Tags:

python

pandas

for example, csv file is as below ,(1,2,3) is header!

1,2,3
0,0,0

I read csv file using pd.read_csv and print

import pandas as pd
df = pd.read_csv('./test.csv')
print(df[1])

it occur error key error:1

it seems like that read_csv parse header as string..

is there any way using integer type in dataframe column?

like image 260
이승훈 Avatar asked Mar 12 '18 06:03

이승훈


3 Answers

I think more general is cast to columns names to integer by astype:

df = pd.read_csv('./test.csv')
df.columns = df.columns.astype(int)

Another way is first get only first column and use parameter names in read_csv:

import csv
with open("file.csv", "r") as f:
    reader = csv.reader(f)
    i = np.array(next(reader)).astype(int)

#another way
#i = pd.read_csv("file.csv", nrows=0).columns.astype(int)
print (i)
[1 2 3]

df = pd.read_csv("file.csv", names=i, skiprows=1)
print (df.columns)
Int64Index([1, 2, 3], dtype='int64')
like image 181
jezrael Avatar answered Oct 02 '22 23:10

jezrael


Skip the header column using skiprows=1 and header=None. This automatically loads in a dataframe with integer headers starting from 0 onwards.

df = pd.read_csv('test.csv', skiprows=1, header=None).rename(columns=lambda x: x + 1)

df    
   1  2  3
0  0  0  0

The rename call is optional, but if you want your headers to start from 1, you may keep it in.


If you have a MultiIndex, use set_levels to set just the 0th level to integer:

df.columns = df.columns.set_levels(
     df.columns.get_level_values(0).astype(int), level=0
)
like image 32
cs95 Avatar answered Oct 03 '22 00:10

cs95


You can use set_axis in conjunction with a lambda and pd.Index.map

Consider a csv that looks like:

1,1,2,2
a,b,a,b
1,3,5,7
0,2,4,6

Read it like:

df = pd.read_csv('test.csv', header=[0, 1])
df

   1     2   
   a  b  a  b
0  1  3  5  7
1  0  2  4  6

You can pipeline the column setting with integers in the first level like:

df.set_axis(df.columns.map(lambda i: (int(i[0]), i[1])), axis=1, inplace=False)

   1     2   
   a  b  a  b
0  1  3  5  7
1  0  2  4  6
like image 25
piRSquared Avatar answered Oct 02 '22 23:10

piRSquared