I am working with pandas dataframes that are essentially time series like this:
level Date 1976-01-01 409.67 1976-02-01 409.58 1976-03-01 409.66 …
What I want to have, is multiple indexes/headers for the level column, like so:
Station1 #Name of the datasource 43.1977317,-4.6473648,5 #Lat/Lon of the source Precip #Type of data Date 1976-01-01 409.67 1976-02-01 409.58 1976-03-01 409.66 …
So essentially I am searching for something like Mydata.columns.level1 = ['Station1']
, Mydata.columns.level2 = [Lat,Lon]
, Mydata.columns.level3 = ['Precip']
.
Reason being that a single location can have multiple datasets, and that I want to be able to pick either all data from one location, or all data of a certain type from all locations, from a subsequent merged, big dataframe.
I can set up an example dataframe from the pandas documentation, and test my selection, but with my real data, I need a different way to set the indexes as in the example.
Example:
Built a small dataframe
header = [np.array(['location','location','location','location2','location2','location2']), np.array(['S1','S2','S3','S1','S2','S3'])] df = pd.DataFrame(np.random.randn(5, 6), index=['a','b','c','d','e'], columns = header ) df location location2 S1 S2 S3 S1 S2 S3 a -1.469932 -1.544511 -1.373463 -0.317262 0.024832 -0.641000 b 0.047170 -0.339423 1.351253 0.601172 -1.607339 0.035932 c -0.257479 1.140829 0.188291 -0.242490 1.019315 -1.163429 d 0.832949 0.098170 -0.818513 -0.070383 0.557419 -0.489839 e -0.628549 -0.158419 0.366167 -2.319316 -0.474897 -0.319549
Pick datatype or location:
df.loc(axis=1)[:,'S1'] location location2 S1 S1 a -1.469932 -0.317262 b 0.047170 0.601172 c -0.257479 -0.242490 d 0.832949 -0.070383 e -0.628549 -2.319316 df['location'] S1 S2 S3 a -1.469932 -1.544511 -1.373463 b 0.047170 -0.339423 1.351253 c -0.257479 1.140829 0.188291 d 0.832949 0.098170 -0.818513 e -0.628549 -0.158419 0.366167
Or am I just looking for the wrong terminology? Because 90% of all examples in the documentation, and the questions here only treat the vertical "stuff" (dates or abcde in my case) as index, and a quick df.index.values
on my test data also just gets me the vertical array(['a', 'b', 'c', 'd', 'e'], dtype=object)
.
Select the sheet tab for the sheet for which you want to display multiple header rows or columns. Select the Settings menu. Select the Header Editor icon in the Other Settings section. Select Column Header or Row Header in the Selected Header drop-down box.
To add multiple headers, we need to create a list of lists of headers and use it to rename the columns of the dataframe.
You can also construct a MultiIndex from a DataFrame directly, using the method MultiIndex.
You can use multiIndex to give multiple columns with names for each level. Use MultiIndex.from_product()
to make multiIndex from cartesian products of multiple iterables.
header = pd.MultiIndex.from_product([['location1','location2'], ['S1','S2','S3']], names=['loc','S']) df = pd.DataFrame(np.random.randn(5, 6), index=['a','b','c','d','e'], columns=header)
Two levels will be loc and S.
df loc location1 location2 S S1 S2 S3 S1 S2 S3 a -1.245988 0.858071 -1.433669 0.105300 -0.630531 -0.148113 b 1.132016 0.318813 0.949564 -0.349722 -0.904325 0.443206 c -0.017991 0.032925 0.274248 0.326454 -0.108982 0.567472 d 2.363533 -1.676141 0.562893 0.967338 -1.071719 -0.321113 e 1.921324 0.110705 0.023244 -0.432196 0.172972 -0.50368
Now you can use xs to slice the dateframe based on levels.
df.xs('location1',level='loc',axis=1) S S1 S2 S3 a -1.245988 0.858071 -1.433669 b 1.132016 0.318813 0.949564 c -0.017991 0.032925 0.274248 d 2.363533 -1.676141 0.562893 e 1.921324 0.110705 0.02324 df.xs('S1',level='S',axis=1) loc location1 location2 a -1.245988 0.105300 b 1.132016 -0.349722 c -0.017991 0.326454 d 2.363533 0.967338 e 1.921324 -0.43219
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With