Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Giving a column multiple indexes/headers

Tags:

I am working with pandas dataframes that are essentially time series like this:

             level Date               1976-01-01  409.67 1976-02-01  409.58 1976-03-01  409.66 … 

What I want to have, is multiple indexes/headers for the level column, like so:

           Station1                   #Name of the datasource            43.1977317,-4.6473648,5    #Lat/Lon of the source            Precip                     #Type of data Date               1976-01-01  409.67 1976-02-01  409.58 1976-03-01  409.66 … 

So essentially I am searching for something like Mydata.columns.level1 = ['Station1'], Mydata.columns.level2 = [Lat,Lon], Mydata.columns.level3 = ['Precip'].

Reason being that a single location can have multiple datasets, and that I want to be able to pick either all data from one location, or all data of a certain type from all locations, from a subsequent merged, big dataframe.

I can set up an example dataframe from the pandas documentation, and test my selection, but with my real data, I need a different way to set the indexes as in the example.

Example:

Built a small dataframe

header = [np.array(['location','location','location','location2','location2','location2']),  np.array(['S1','S2','S3','S1','S2','S3'])]  df = pd.DataFrame(np.random.randn(5, 6), index=['a','b','c','d','e'], columns = header )     df     location                      location2                              S1        S2        S3         S1        S2        S3 a -1.469932 -1.544511 -1.373463  -0.317262  0.024832 -0.641000 b  0.047170 -0.339423  1.351253   0.601172 -1.607339  0.035932 c -0.257479  1.140829  0.188291  -0.242490  1.019315 -1.163429 d  0.832949  0.098170 -0.818513  -0.070383  0.557419 -0.489839 e -0.628549 -0.158419  0.366167  -2.319316 -0.474897 -0.319549 

Pick datatype or location:

df.loc(axis=1)[:,'S1']     location  location2          S1         S1 a -1.469932  -0.317262 b  0.047170   0.601172 c -0.257479  -0.242490 d  0.832949  -0.070383 e -0.628549  -2.319316  df['location']           S1        S2        S3 a -1.469932 -1.544511 -1.373463 b  0.047170 -0.339423  1.351253 c -0.257479  1.140829  0.188291 d  0.832949  0.098170 -0.818513 e -0.628549 -0.158419  0.366167 

Or am I just looking for the wrong terminology? Because 90% of all examples in the documentation, and the questions here only treat the vertical "stuff" (dates or abcde in my case) as index, and a quick df.index.values on my test data also just gets me the vertical array(['a', 'b', 'c', 'd', 'e'], dtype=object).

like image 290
JC_CL Avatar asked Sep 03 '15 08:09

JC_CL


People also ask

How do I create multiple column headers in Excel?

Select the sheet tab for the sheet for which you want to display multiple header rows or columns. Select the Settings menu. Select the Header Editor icon in the Other Settings section. Select Column Header or Row Header in the Selected Header drop-down box.

Can a DataFrame have two headers?

To add multiple headers, we need to create a list of lists of headers and use it to rename the columns of the dataframe.

Can DataFrame have multiple indexes?

You can also construct a MultiIndex from a DataFrame directly, using the method MultiIndex.


1 Answers

You can use multiIndex to give multiple columns with names for each level. Use MultiIndex.from_product() to make multiIndex from cartesian products of multiple iterables.

header = pd.MultiIndex.from_product([['location1','location2'],                                      ['S1','S2','S3']],                                     names=['loc','S']) df = pd.DataFrame(np.random.randn(5, 6),                    index=['a','b','c','d','e'],                    columns=header) 

Two levels will be loc and S.

df loc location1                     location2                     S          S1        S2        S3        S1        S2        S3 a   -1.245988  0.858071 -1.433669  0.105300 -0.630531 -0.148113 b    1.132016  0.318813  0.949564 -0.349722 -0.904325  0.443206 c   -0.017991  0.032925  0.274248  0.326454 -0.108982  0.567472 d    2.363533 -1.676141  0.562893  0.967338 -1.071719 -0.321113 e    1.921324  0.110705  0.023244 -0.432196  0.172972 -0.50368 

Now you can use xs to slice the dateframe based on levels.

df.xs('location1',level='loc',axis=1)  S        S1        S2        S3 a -1.245988  0.858071 -1.433669 b  1.132016  0.318813  0.949564 c -0.017991  0.032925  0.274248 d  2.363533 -1.676141  0.562893 e  1.921324  0.110705  0.02324  df.xs('S1',level='S',axis=1)  loc  location1  location2 a    -1.245988   0.105300 b     1.132016  -0.349722 c    -0.017991   0.326454 d     2.363533   0.967338 e     1.921324  -0.43219 
like image 75
kanatti Avatar answered Oct 05 '22 12:10

kanatti