Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling Variable Number of Columns with Pandas - Python

Tags:

python

pandas

I have a data set that looks like this (at most 5 columns - but can be less)

1,2,3 1,2,3,4 1,2,3,4,5 1,2 1,2,3,4 .... 

I am trying to use pandas read_table to read this into a 5 column data frame. I would like to read this in without additional massaging.

If I try

import pandas as pd my_cols=['A','B','C','D','E'] my_df=pd.read_table(path,sep=',',header=None,names=my_cols) 

I get an error - "column names have 5 fields, data has 3 fields".

Is there any way to make pandas fill in NaN for the missing columns while reading the data?

like image 512
Jackie Shephard Avatar asked Mar 06 '13 08:03

Jackie Shephard


People also ask

How do you select the number of columns in Python?

We can use double square brackets [[]] to select multiple columns from a data frame in Pandas. In the above example, we used a list containing just a single variable/column name to select the column. If we want to select multiple columns, we specify the list of column names in the order we like.

How many columns can Pandas handle?

There isn't a set maximum of columns - the issue is that you've quite simply run out of available memory on your computer, unfortunately.

How do you count the number of elements in a panda?

Pandas DataFrame count() MethodThe count() method counts the number of not empty values for each row, or column if you specify the axis parameter as axis='columns' , and returns a Series object with the result for each row (or column).


1 Answers

One way which seems to work (at least in 0.10.1 and 0.11.0.dev-fc8de6d):

>>> !cat ragged.csv 1,2,3 1,2,3,4 1,2,3,4,5 1,2 1,2,3,4 >>> my_cols = ["A", "B", "C", "D", "E"] >>> pd.read_csv("ragged.csv", names=my_cols, engine='python')    A  B   C   D   E 0  1  2   3 NaN NaN 1  1  2   3   4 NaN 2  1  2   3   4   5 3  1  2 NaN NaN NaN 4  1  2   3   4 NaN 

Note that this approach requires that you give names to the columns you want, though. Not as general as some other ways, but works well enough when it applies.

like image 144
DSM Avatar answered Sep 19 '22 12:09

DSM