I have a reference DataFrame like below:
Variables Key Values
0 GRTYPE 40 Total exclusions 4-year schools
1 GRTYPE 2 4-year institutions, Adjusted cohort
2 GRTYPE 3 4-year institutions, Completers
41 CHRTSTAT 2 Revised cohort
42 CHRTSTAT 3 Exclusions
43 CHRTSTAT 4 Adjusted cohort
57 SECTION 12 Bachelors/ equiv .
58 SECTION 23 Bachelors or equiv 2009 .
and I would like to use the reference data frame to replace the values in the main DataFrame below:
GRTYPE CHRTSTAT SECTION
0 40 2 12
1 2 3 12
2 2 4 23
3 3 2 12
4 3 3 23
The end result would be:
GRTYPE CHRTSTAT SECTION
0 Total exclusions 4-year schools Revised cohort Bachelors/ equiv .
1 4-year institutions, Adjusted cohort Exclusions Bachelors/ equiv .
2 4-year institutions, Adjusted cohort Adjusted cohort Bachelors or equiv 2009 .
3 4-year institutions, Completers Revised cohort Bachelors/ equiv .
4 4-year institutions, Completers Exclusions Bachelors or equiv 2009 .
What is the best way of doing this in pandas or python? I tried joining and extracting the variables from the first dataframe and loop over the second one and did not get anywhere.
map
You need to set Variables
and Key
as indexes of the mapping dataframe and then simply use map on columns.
mapping_df = mapping_df.set_index(['Variables', 'Key'])
df = df.apply(lambda x: x.map(mapping_df.loc[x.name]['Values']))
Which is same as:
mapping_df = mapping_df.set_index(['Variables', 'Key'])
df['GRTYPE'] = df.GRTYPE.map(mapping_df.loc['GRTYPE']['Values'])
df['CHRTSTAT'] = df.CHRTSTAT.map(mapping_df.loc['CHRTSTAT']['Values'])
df['SECTION'] = df.SECTION.map(mapping_df.loc['SECTION']['Values'])
Output:
GRTYPE CHRTSTAT SECTION
0 Total exclusions 4-year schools Revised cohort Bachelors/ equiv .
1 4-year institutions, Adjusted cohort Exclusions Bachelors/ equiv .
2 4-year institutions, Adjusted cohort Adjusted cohort Bachelors or equiv 2009 .
3 4-year institutions, Completers Revised cohort Bachelors/ equiv .
4 4-year institutions, Completers Exclusions Bachelors or equiv 2009 .
defualtdict
from collections import defaultdict
d = defaultdict(dict)
for i, k, v in df1.itertuples(index=False):
d[i][k] = v
pd.DataFrame(dict(zip(df2, [[d[i][k] for k in df2[i]] for i in df2])), df2.index)
GRTYPE CHRTSTAT SECTION
0 Total exclusions 4-year schools Revised cohort Bachelors/ equiv .
1 4-year institutions, Adjusted cohort Exclusions Bachelors/ equiv .
2 4-year institutions, Adjusted cohort Adjusted cohort Bachelors or equiv 2009 .
3 4-year institutions, Completers Revised cohort Bachelors/ equiv .
4 4-year institutions, Completers Exclusions Bachelors or equiv 2009 .
apply
df2.apply(
lambda s: s.apply(
lambda x, n: df1.set_index(['Variables', 'Key']).Values[(n, x)], n=s.name
)
)
GRTYPE CHRTSTAT SECTION
0 Total exclusions 4-year schools Revised cohort Bachelors/ equiv .
1 4-year institutions, Adjusted cohort Exclusions Bachelors/ equiv .
2 4-year institutions, Adjusted cohort Adjusted cohort Bachelors or equiv 2009 .
3 4-year institutions, Completers Revised cohort Bachelors/ equiv .
4 4-year institutions, Completers Exclusions Bachelors or equiv 2009 .
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With