Handling Variable Number of Columns with Pandas - Python

Tags:

python

pandas

I have a data set that looks like this (at most 5 columns - but can be less)

1,2,3 1,2,3,4 1,2,3,4,5 1,2 1,2,3,4 ....

I am trying to use pandas read_table to read this into a 5 column data frame. I would like to read this in without additional massaging.

If I try

import pandas as pd my_cols=['A','B','C','D','E'] my_df=pd.read_table(path,sep=',',header=None,names=my_cols)

I get an error - "column names have 5 fields, data has 3 fields".

Is there any way to make pandas fill in NaN for the missing columns while reading the data?

512

asked Mar 06 '13 08:03

Jackie Shephard

1 Answers

One way which seems to work (at least in 0.10.1 and 0.11.0.dev-fc8de6d):

>>> !cat ragged.csv 1,2,3 1,2,3,4 1,2,3,4,5 1,2 1,2,3,4 >>> my_cols = ["A", "B", "C", "D", "E"] >>> pd.read_csv("ragged.csv", names=my_cols, engine='python')    A  B   C   D   E 0  1  2   3 NaN NaN 1  1  2   3   4 NaN 2  1  2   3   4   5 3  1  2 NaN NaN NaN 4  1  2   3   4 NaN

Note that this approach requires that you give names to the columns you want, though. Not as general as some other ways, but works well enough when it applies.

144

answered Sep 19 '22 12:09

DSM

Related questions
                            
                                Ordinal numbers replacement
                            
                                How to save a list to a file and read it as a list type?
                            
                                move column in pandas dataframe
                            
                                Mapping a range of values to another
                            
                                How to left align a fixed width string?
                            
                                Stopword removal with NLTK
                            
                                Error installing Python Image Library using pip on Mac OS X 10.9
                            
                                Convert ConfigParser.items('') to dictionary
                            
                                Python db-api: fetchone vs fetchmany vs fetchall
                            
                                Differences and uses between WSGI, CGI, FastCGI, and mod_python in regards to Python?
                            
                                Is there any difference between using ABC vs ABCMeta?
                            
                                Is virtualenv recommended for django production server? [closed]
                            
                                How to dynamically change base class of instances at runtime?
                            
                                Does JavaScript support array/list comprehensions like Python?
                            
                                Why would I put code in __init__.py files?
                            
                                How do I type a floating point infinity literal in python
                            
                                Why is there no first(iterable) built-in function in Python?
                            
                                How to test or mock "if __name__ == '__main__'" contents
                            
                                module has no attribute
                            
                                what are all the dtypes that pandas recognizes?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With