I'm using Pandas to store a large dataset that has systematically generated column names. Something like this:
import numpy as np
import pandas as pd
df = pd.DataFrame([[0,1,2],[10,11,12],[20,21,22]],columns=["r0","r1","r2"])
These systematic names also have more meaningful names that users would actually understand. So far, I've been mapping them using a dictionary like so:
altName = {"Objective 1":"r0", "Result 5":"r1", "Parameter 2":"r2"}
so that they could then be accessed like this:
print(df[altName["Objective 1"]])
This works, but it leads to very hard to read code (think a plot command with multiple variables, etc.). I can't simply rename the columns to the friendly names because there are times when I need access to both, but I'm not sure how to support both simultaneously without a dictionary.
Is it possible to assign more than one name to a column, or do some sort of implicit mapping that would let me use both of these access methods:
print(df["r0"])
print(df["Objective 1])
I've thought of making my own subclass that would detect a keyerror and then fail to a secondary dictionary of alternate names and try that, but I wasn't sure I'd be able to do that while preserving all other DataFrame functionality (I'd self-assess my Python beginner bordering on intermediate).
Thanks very much for your suggestions.
Yes you can. Dataframes are just wrappers on numpy arrays, so you can multiply the wrappers :
An example:
df=pd.DataFrame([ [0,1], [2,3] ],list('AB'), columns=list('CD'))
df2=pd.DataFrame(df.values,df.index, columns=list('EF'))
df.loc['A','C']=999
Then df2 is also affected :
In [407]: df2['E']
Out[407]:
A 999
B 2
Name: E, dtype: int32
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With