Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Load CSV to Pandas MultiIndex DataFrame

I have a 719mb CSV file that looks like:

from, to, dep, freq, arr, code, mode   (header row) RGBOXFD,RGBPADTON,127,0,27,99999,2 RGBOXFD,RGBPADTON,127,0,33,99999,2 RGBOXFD,RGBRDLEY,127,0,1425,99999,2 RGBOXFD,RGBCHOLSEY,127,0,52,99999,2 RGBOXFD,RGBMDNHEAD,127,0,91,99999,2 RGBDIDCOTP,RGBPADTON,127,0,46,99999,2 RGBDIDCOTP,RGBPADTON,127,0,3,99999,2 RGBDIDCOTP,RGBCHOLSEY,127,0,61,99999,2 RGBDIDCOTP,RGBRDLEY,127,0,1430,99999,2 RGBDIDCOTP,RGBPADTON,127,0,115,99999,2 and so on...  

I want to load in to a pandas DataFrame. Now I know there is a load from csv method:

 r = pd.DataFrame.from_csv('test_data2.csv') 

But I specifically want to load it as a 'MultiIndex' DataFrame where from and to are the indexes:

So ending up with:

                   dep, freq, arr, code, mode RGBOXFD RGBPADTON  127     0   27  99999    2         RGBRDLEY   127     0   33  99999    2         RGBCHOLSEY 127     0 1425  99999    2         RGBMDNHEAD 127     0 1525  99999    2 

etc. I'm not sure how to do that?

like image 736
Handloomweaver Avatar asked Sep 30 '13 20:09

Handloomweaver


People also ask

How do I use MultiIndex columns in pandas?

pandas MultiIndex to ColumnsUse pandas DataFrame. reset_index() function to convert/transfer MultiIndex (multi-level index) indexes to columns. The default setting for the parameter is drop=False which will keep the index values as columns and set the new index to DataFrame starting from zero.

What is Panda MultiIndex?

The MultiIndex object is the hierarchical analogue of the standard Index object which typically stores the axis labels in pandas objects. You can think of MultiIndex as an array of tuples where each tuple is unique. A MultiIndex can be created from a list of arrays (using MultiIndex.


1 Answers

You could use pd.read_csv:

>>> df = pd.read_csv("test_data2.csv", index_col=[0,1], skipinitialspace=True) >>> df                        dep  freq   arr   code  mode from       to                                       RGBOXFD    RGBPADTON   127     0    27  99999     2            RGBPADTON   127     0    33  99999     2            RGBRDLEY    127     0  1425  99999     2            RGBCHOLSEY  127     0    52  99999     2            RGBMDNHEAD  127     0    91  99999     2 RGBDIDCOTP RGBPADTON   127     0    46  99999     2            RGBPADTON   127     0     3  99999     2            RGBCHOLSEY  127     0    61  99999     2            RGBRDLEY    127     0  1430  99999     2            RGBPADTON   127     0   115  99999     2 

where I've used skipinitialspace=True to get rid of those annoying spaces in the header row.

like image 189
DSM Avatar answered Sep 23 '22 00:09

DSM