Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas.read_csv moves column names over one

Tags:

python

pandas

csv

I am using the ALL.zip file located here. My goal is to create a pandas DataFrame with it. However, if I run

data=pd.read_csv(foo.csv)

the column names do not match up. The first column has no name, and then the second column is labeled with the first, and the last column is a Series of NaN. So I tried

colnames=[list of colnames]
data=pd.read_csv(foo.csv, names=colnames, header=False)

which gave me the exact same thing, so I ran

data=pd.read_csv(foo.csv, names=colnames)

which lined the colnames up perfectly, but had the csv assigned column names(the first line in the csv document) perfectly aligned as the first row of data it. So I ran

data=data[1:]

which did the trick.

So I found a work around without solving the actual problem. I looked at the read_csv document and found it a bit overwhelming, and could not figure out a way using only pd.read_csv to fix this problem.

What was the fundamental problem (I am assuming it is either user error or a problem with the file)? Is there a way to fix it with one of the commands from the read_csv?

Here is the first 2 rows from the csv file

cmte_id,cand_id,cand_nm,contbr_nm,contbr_city,contbr_st,contbr_zip,contbr_employer,contbr_occupation,contb_receipt_amt,contb_receipt_dt,receipt_desc,memo_cd,memo_text,form_tp,file_num,tran_id,election_tp
C00458844,"P60006723","Rubio, Marco","HEFFERNAN, MICHAEL","APO","AE","090960009","INFORMATION REQUESTED PER BEST EFFORTS","INFORMATION REQUESTED PER BEST EFFORTS",210,27-JUN-15,"","","","SA17A","1015697","SA17.796904","P2016",
like image 678
lost Avatar asked Oct 01 '15 21:10

lost


People also ask

How do I rearrange column names in pandas?

Reorder Columns using Pandas . Another way to reorder columns is to use the Pandas . reindex() method. This allows you to pass in the columns= parameter to pass in the order of columns that you want to use.

How do I move a column to a specific position in pandas?

The basic idea to move a column in a pandas dataframe is to remove the column from its current place and insert it in the desired position. The pandas library offers many useful functions such as pop() and insert(). We will make use of these two functions to manipulate with our dataframe.

How do you add a column name to a reading CSV?

Use pandas. read_csv() to set column names when reading in a csv. Call pandas. read_csv(filepath_or_buffer, names = None) with filepath_or_buffer set to the filename of the .

What is encoding in read_csv?

Source from Kaggle character encoding. The Pandas read_csv() function has an argument call encoding that allows you to specify an encoding to use when reading a file.


2 Answers

It's not the column that you're having a problem with, it's the index

import pandas as pd

df = pd.read_csv('P00000001-ALL.csv', index_col=False, low_memory=False)

print(df.head(1))

     cmte_id    cand_id       cand_nm           contbr_nm contbr_city  \
0  C00458844  P60006723  Rubio, Marco  HEFFERNAN, MICHAEL         APO   

  contbr_st contbr_zip                         contbr_employer  \
0        AE  090960009  INFORMATION REQUESTED PER BEST EFFORTS   

                        contbr_occupation  contb_receipt_amt contb_receipt_dt  \
0  INFORMATION REQUESTED PER BEST EFFORTS                210        27-JUN-15   

  receipt_desc memo_cd memo_text form_tp  file_num      tran_id election_tp  
0          NaN     NaN       NaN   SA17A   1015697  SA17.796904       P2016  

The low_memory=False is because column 6 has mixed datatype.

like image 147
Leb Avatar answered Sep 20 '22 09:09

Leb


The problem comes from having every line in the file except for the first terminating in a comma (the separator character). Pandas thinks there's an empty column there if it needs to consider the first 'column name' as the index column.

Try

data= pd.read_csv('P00000001-AL.csv',index_col=False)
like image 26
vmg Avatar answered Sep 22 '22 09:09

vmg