pandas.read_csv moves column names over one

Tags:

I am using the ALL.zip file located here. My goal is to create a pandas DataFrame with it. However, if I run

data=pd.read_csv(foo.csv)

the column names do not match up. The first column has no name, and then the second column is labeled with the first, and the last column is a Series of NaN. So I tried

colnames=[list of colnames]
data=pd.read_csv(foo.csv, names=colnames, header=False)

which gave me the exact same thing, so I ran

data=pd.read_csv(foo.csv, names=colnames)

which lined the colnames up perfectly, but had the csv assigned column names(the first line in the csv document) perfectly aligned as the first row of data it. So I ran

data=data[1:]

which did the trick.

So I found a work around without solving the actual problem. I looked at the read_csv document and found it a bit overwhelming, and could not figure out a way using only pd.read_csv to fix this problem.

What was the fundamental problem (I am assuming it is either user error or a problem with the file)? Is there a way to fix it with one of the commands from the read_csv?

Here is the first 2 rows from the csv file

cmte_id,cand_id,cand_nm,contbr_nm,contbr_city,contbr_st,contbr_zip,contbr_employer,contbr_occupation,contb_receipt_amt,contb_receipt_dt,receipt_desc,memo_cd,memo_text,form_tp,file_num,tran_id,election_tp
C00458844,"P60006723","Rubio, Marco","HEFFERNAN, MICHAEL","APO","AE","090960009","INFORMATION REQUESTED PER BEST EFFORTS","INFORMATION REQUESTED PER BEST EFFORTS",210,27-JUN-15,"","","","SA17A","1015697","SA17.796904","P2016",

678

asked Oct 01 '15 21:10

lost

2 Answers

It's not the column that you're having a problem with, it's the index

import pandas as pd

df = pd.read_csv('P00000001-ALL.csv', index_col=False, low_memory=False)

print(df.head(1))

     cmte_id    cand_id       cand_nm           contbr_nm contbr_city  \
0  C00458844  P60006723  Rubio, Marco  HEFFERNAN, MICHAEL         APO   

  contbr_st contbr_zip                         contbr_employer  \
0        AE  090960009  INFORMATION REQUESTED PER BEST EFFORTS   

                        contbr_occupation  contb_receipt_amt contb_receipt_dt  \
0  INFORMATION REQUESTED PER BEST EFFORTS                210        27-JUN-15   

  receipt_desc memo_cd memo_text form_tp  file_num      tran_id election_tp  
0          NaN     NaN       NaN   SA17A   1015697  SA17.796904       P2016

The low_memory=False is because column 6 has mixed datatype.

147

answered Sep 20 '22 09:09

Leb

The problem comes from having every line in the file except for the first terminating in a comma (the separator character). Pandas thinks there's an empty column there if it needs to consider the first 'column name' as the index column.

Try

data= pd.read_csv('P00000001-AL.csv',index_col=False)

answered Sep 22 '22 09:09

vmg

Related questions
                            
                                python flask-restful blueprint and factory pattern work together?
                            
                                How to subtract all rows in a dataframe with a row from another dataframe?
                            
                                Open a file in memory
                            
                                Install a package into multiple/all conda environments?
                            
                                Python: Fetch item in list where dict key is some value using lambda
                            
                                Find the non-intersecting values of two arrays
                            
                                Unable to log in to ASP.NET website with requests module of Python
                            
                                Find previous occurrence of an element
                            
                                Adjusting space around figure with subplots
                            
                                ImportError at / No module named quickstart in django rest framework
                            
                                How to pass a function as a function parameter in Python
                            
                                Can I embed plotly graphs (offline) in my PyQt4 application?
                            
                                How to get one field from model in django
                            
                                Write thread-safe to file in python
                            
                                How to use matplotlib set_yscale
                            
                                Placing plot on Tkinter main window in Python
                            
                                how to render template in flask without using request context
                            
                                multiprocessing: TypeError: 'int' object is not iterable
                            
                                Nohup for Python script not working when running in the background with &
                            
                                Draw Marker in Image

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pandas.read_csv moves column names over one

Tags:

python

pandas

csv

lost

People also ask

2 Answers

Leb

vmg

Recent Activity

Donate For Us