Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace pandas values in a column by a look up from a different dataframe

Tags:

python

pandas

I have a reference DataFrame like below:

    Variables   Key Values  
0   GRTYPE      40  Total exclusions 4-year schools
1   GRTYPE      2   4-year institutions, Adjusted cohort
2   GRTYPE      3   4-year institutions, Completers 
41  CHRTSTAT    2   Revised cohort
42  CHRTSTAT    3   Exclusions
43  CHRTSTAT    4   Adjusted cohort 
57  SECTION     12  Bachelors/ equiv .
58  SECTION     23  Bachelors or equiv 2009 .

and I would like to use the reference data frame to replace the values in the main DataFrame below:

    GRTYPE      CHRTSTAT  SECTION
0   40             2    12      
1   2              3    12      
2   2              4    23      
3   3              2    12  
4   3              3    23  

The end result would be:

    GRTYPE                                CHRTSTAT          SECTION
0   Total exclusions 4-year schools         Revised cohort       Bachelors/ equiv . 
1   4-year institutions, Adjusted cohort    Exclusions           Bachelors/ equiv .         
2   4-year institutions, Adjusted cohort    Adjusted cohort      Bachelors or equiv 2009 .      
3   4-year institutions, Completers         Revised cohort       Bachelors/ equiv . 
4   4-year institutions, Completers         Exclusions           Bachelors or equiv 2009 .  

What is the best way of doing this in pandas or python? I tried joining and extracting the variables from the first dataframe and loop over the second one and did not get anywhere.

like image 468
jax Avatar asked Mar 05 '23 15:03

jax


2 Answers

Using map

You need to set Variables and Key as indexes of the mapping dataframe and then simply use map on columns.

mapping_df = mapping_df.set_index(['Variables', 'Key'])
df = df.apply(lambda x: x.map(mapping_df.loc[x.name]['Values']))

Which is same as:

mapping_df = mapping_df.set_index(['Variables', 'Key'])
df['GRTYPE'] = df.GRTYPE.map(mapping_df.loc['GRTYPE']['Values'])
df['CHRTSTAT'] = df.CHRTSTAT.map(mapping_df.loc['CHRTSTAT']['Values'])
df['SECTION'] = df.SECTION.map(mapping_df.loc['SECTION']['Values'])

Output:

                                 GRTYPE         CHRTSTAT                    SECTION
0       Total exclusions 4-year schools   Revised cohort         Bachelors/ equiv .
1  4-year institutions, Adjusted cohort       Exclusions         Bachelors/ equiv .
2  4-year institutions, Adjusted cohort  Adjusted cohort  Bachelors or equiv 2009 .
3       4-year institutions, Completers   Revised cohort         Bachelors/ equiv .
4       4-year institutions, Completers       Exclusions  Bachelors or equiv 2009 .
like image 171
harvpan Avatar answered Apr 25 '23 11:04

harvpan


Using a defualtdict

from collections import defaultdict

d = defaultdict(dict)
for i, k, v in df1.itertuples(index=False):
    d[i][k] = v

pd.DataFrame(dict(zip(df2, [[d[i][k] for k in df2[i]] for i in df2])), df2.index)

                                 GRTYPE         CHRTSTAT                    SECTION
0       Total exclusions 4-year schools   Revised cohort         Bachelors/ equiv .
1  4-year institutions, Adjusted cohort       Exclusions         Bachelors/ equiv .
2  4-year institutions, Adjusted cohort  Adjusted cohort  Bachelors or equiv 2009 .
3       4-year institutions, Completers   Revised cohort         Bachelors/ equiv .
4       4-year institutions, Completers       Exclusions  Bachelors or equiv 2009 .

apply

df2.apply(
    lambda s: s.apply(
        lambda x, n: df1.set_index(['Variables', 'Key']).Values[(n, x)], n=s.name
    )
)

                                 GRTYPE         CHRTSTAT                    SECTION
0       Total exclusions 4-year schools   Revised cohort         Bachelors/ equiv .
1  4-year institutions, Adjusted cohort       Exclusions         Bachelors/ equiv .
2  4-year institutions, Adjusted cohort  Adjusted cohort  Bachelors or equiv 2009 .
3       4-year institutions, Completers   Revised cohort         Bachelors/ equiv .
4       4-year institutions, Completers       Exclusions  Bachelors or equiv 2009 .
like image 23
piRSquared Avatar answered Apr 25 '23 13:04

piRSquared