Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Pandas DataFrame with Multi-name Columns

Tags:

I'm using Pandas to store a large dataset that has systematically generated column names. Something like this:

import numpy as np
import pandas as pd
df = pd.DataFrame([[0,1,2],[10,11,12],[20,21,22]],columns=["r0","r1","r2"])

These systematic names also have more meaningful names that users would actually understand. So far, I've been mapping them using a dictionary like so:

altName = {"Objective 1":"r0", "Result 5":"r1", "Parameter 2":"r2"}

so that they could then be accessed like this:

print(df[altName["Objective 1"]])

This works, but it leads to very hard to read code (think a plot command with multiple variables, etc.). I can't simply rename the columns to the friendly names because there are times when I need access to both, but I'm not sure how to support both simultaneously without a dictionary.

Is it possible to assign more than one name to a column, or do some sort of implicit mapping that would let me use both of these access methods:

print(df["r0"])
print(df["Objective 1])

I've thought of making my own subclass that would detect a keyerror and then fail to a secondary dictionary of alternate names and try that, but I wasn't sure I'd be able to do that while preserving all other DataFrame functionality (I'd self-assess my Python beginner bordering on intermediate).

Thanks very much for your suggestions.

like image 718
Andrew Avatar asked May 10 '16 15:05

Andrew


1 Answers

Yes you can. Dataframes are just wrappers on numpy arrays, so you can multiply the wrappers :

An example:

df=pd.DataFrame([ [0,1], [2,3] ],list('AB'), columns=list('CD'))
df2=pd.DataFrame(df.values,df.index, columns=list('EF'))
df.loc['A','C']=999

Then df2 is also affected :

In [407]: df2['E']
Out[407]: 
A    999
B      2
Name: E, dtype: int32
like image 106
B. M. Avatar answered Nov 15 '22 06:11

B. M.