Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read multi-index on the columns from csv file

I have a .csv file that looks like this:

Male, Male, Male, Female, Female R, R, L, R, R .86, .67, .88, .78, .81 

I want to read that into a df, so that I have:

    Male        Female     R       L   R 0   .86 .67 .88 .78 .81 

I did:

df = pd.read_csv('file.csv', header=[0,1]) 

But headers does not cut it. Which results in

Empty DataFrame Columns: [(Male, R), (Male, R), (Male, L), (Female, R), (Female, R)] Index: [] 

Yet, the docs on headers says:

(...)Can be a list of integers that specify row locations for a multi-index on the columns E.g. [0,1,3] 

What am I doing wrong? How can I possibly make it work?

like image 262
nutship Avatar asked Jan 23 '14 20:01

nutship


People also ask

How does pandas handle multiple index columns?

pandas MultiIndex to ColumnsUse pandas DataFrame. reset_index() function to convert/transfer MultiIndex (multi-level index) indexes to columns. The default setting for the parameter is drop=False which will keep the index values as columns and set the new index to DataFrame starting from zero. Yields below output.

How do you slice multiple index in pandas?

You can slice a MultiIndex by providing multiple indexers. You can provide any of the selectors as if you are indexing by label, see Selection by Label, including slices, lists of labels, labels, and boolean indexers. You can use slice(None) to select all the contents of that level.


1 Answers

I think the problem is that you have duplicated columns: two ( Female, R).

Not sure whether it's a bug or the duplicated columns are unacceptable. Here's a workaround for you:

First read the csv with tupleize_cols=True

In [61]: df = pd.read_csv('test.csv', header=[0, 1], skipinitialspace=True, tupleize_cols=True)  In [62]: df Out[62]:     (Male, R)  (Male, R)  (Male, L)  (Female, R)  (Female, R) 0       0.67       0.67       0.88         0.81         0.81  [1 rows x 5 columns] 

Then convert the type of the column from Index to MultiIndex

In [63]: df.columns = pd.MultiIndex.from_tuples(df.columns)  In [64]: df Out[64]:     Male              Female             R     R     L       R     R 0  0.67  0.67  0.88    0.81  0.81  [1 rows x 5 columns] 
like image 106
waitingkuo Avatar answered Sep 21 '22 13:09

waitingkuo