Map values from one dataframe to new columns in other based on column values - Pandas

Tags:

I have a problem with mapping values from another dataframe.

These are samples of two dataframes:

df1

product   class_1   class_2   class_3
141A        11        13         5     
53F4        12        11        18  
GS24        14        12        10

df2

id    product_type_0  product_type_1 product_type_2  product_type_3 measure_0 measure_1 measure_2   measure_3
1         141A            GS24             NaN           NaN          1         3           NaN       NaN
2         53F4            NaN              NaN           NaN          1        NaN          NaN       NaN
3         53F4            141A             141A          NaN          2         2            1        NaN
4         141A            GS24             NaN           NaN          3         2           NaN       NaN

What I'm trying to get is next: I need to add a new columns called "Max_Class_1", "Max_Class_2", "Max_Class_3" and that value would be taken from df1. For each order number (_1, _2, _3) look at existing columns (for example product_type_1) product_type_1 and take a row from df1 where the product has the same value. Then look at the measure columns (for example measure_1) and if the value is 1 (it's possible max four different values in original data), new column called "Max_Class_1" would have value same as class_1 for that product_type, in this case 11.

I think it's a little bit simpler than I explained it.

Desired output

id    product_type_0  product_type_1 product_type_2  product_type_3  measure_0 measure_1 measure_2  measure_3  max_class_0  max_class_1  max_class_2  max_class_3
1         141A            GS24             NaN         NaN            1         3         NaN        NaN        1           10            NaN NaN
2         53F4            NaN              NaN         NaN            1        NaN        NaN        NaN        12         NaN           NaN  NaN
3         53F4            141A             141A        NaN            2         2         1          NaN        11          13            11  NaN
4         141A            GS24             NaN         NaN            3         2         NaN        NaN        5           12            NaN NaN

The code I have tried with:

df2['max_class_1'] = None
df2['max_class_2'] = None
df2['max_class_3'] = None

def get_max_class(product_df, measure_df, product_type_column, measure_column, max_class_columns):
    for index, row in measure_df.iterrows():
        product_df_new = product_df[product_df['product'] == row[product_type_column]]
        for ind, r in product_df_new.iterrows():
            if row[measure_column] == 1:
                row[max_class_columns] = r['class_1']
            elif row[measure_column] == 2:
                row[max_class_columns] = r['class_2']
            elif row[measure_column] == 3:
                row[max_class_columns] = r['class_3']
            else:
                row[tilt_column] = "There is no measure or type"
    return measure_df

# And the function call 
first_class = get_max_class(product_df=df1, measure_df=df2, product_type_column=product_type_1, measure_column='measure_1', max_class_columns='max_class_1')

second_class = get_max_class(product_df=df1, measure_df=first_class, product_type_column=product_type_2, measure_column='measure_2', max_class_columns='max_class_2')

third_class = get_max_class(product_df=df1, measure_df=second_class, product_type_column=product_type_3, measure_column='measure_3', max_class_columns='max_class_3')

I'm pretty sure there is a simpler solution, but don't know why is not working. I'm getting all None values, nothing changes.

630

asked Jul 10 '18 13:07

jovicbg

1 Answers

pd.DataFrame.lookup is the standard method for lookups by row and column labels.

Your problem is complicated by the existence of null values. But this can be accommodated by modifying your input mapping dataframe.

Step 1

Rename columns in df1 to integers and add an extra row / column. We will use the added data later to deal with null values.

def rename_cols(x):
    return x if not x.startswith('class') else int(x.split('_')[-1])

df1 = df1.rename(columns=rename_cols)

df1 = df1.set_index('product')
df1.loc['X'] = 0
df1[0] = 0

Your mapping dataframe now looks like:

print(df1)

          1   2   3  0
product               
141A     11  13   5  0
53F4     12  11  18  0
GS24     14  12  10  0
X         0   0   0  0

Step 2

Iterate the number of categories and use pd.DataFrame.lookup. Notice how we fillna with X and 0, exactly what we used for additional mapping data in Step 1.

n = df2.columns.str.startswith('measure').sum()

for i in range(n):
    rows = df2['product_type_{}'.format(i)].fillna('X')
    cols = df2['measure_{}'.format(i)].fillna(0).astype(int)
    df2['max_{}'.format(i)] = df1.lookup(rows, cols)

Result

print(df2)

   id product_type_0 product_type_1 product_type_2  product_type_3  measure_0  \
0   1           141A           GS24            NaN             NaN          1   
1   2           53F4            NaN            NaN             NaN          1   
2   3           53F4           141A           141A             NaN          2   
3   4           141A           GS24            NaN             NaN          3   

   measure_1  measure_2  measure_3  max_0  max_1  max_2  max_3  
0        3.0        NaN        NaN     11     10      0      0  
1        NaN        NaN        NaN     12      0      0      0  
2        2.0        1.0        NaN     11     13     11      0  
3        2.0        NaN        NaN      5     12      0      0

You can convert the 0 to np.nan if required. This will be at the expense of converting your series from int to float, since NaN is considered float.

Of course, if X and 0 are valid values, you can use alternative filler values from the start.

125

answered Oct 12 '22 13:10

jpp

Related questions
                            
                                How to select top n row from each group after group by in pandas?
                            
                                Raise close spider from Scrapy pipeline
                            
                                urlparse fails with simple url
                            
                                Is there a function in google.colab module to close the runtime
                            
                                Pythonic way to hold related variables?
                            
                                pytest: how to use a mark to inject a fixture?
                            
                                Selenium Python - Get a list of all loaded URLs (images, scripts, stylesheets etc)
                            
                                what is the entry point to python source code
                            
                                python exception handling inside with block
                            
                                Why the following operands could not be broadcasted together?
                            
                                Parse a dataframe column by comma and pivot - python
                            
                                custom scaling of wind rose python
                            
                                How can I calculate pct_change() in pandas across two columns, row by row?
                            
                                Dataframe Join Null-Safe Condition Use
                            
                                Pandas resampling from months to weeks
                            
                                Ansible fails to find boto3 and botocore although installed
                            
                                Pivot duplicates rows into new columns Pandas
                            
                                Creating new dataframes using groupby
                            
                                How to write custom F1 score metric in light gbm python in Multiclass classification
                            
                                Walking a directory tree inside a Google Cloud Platform bucket in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Map values from one dataframe to new columns in other based on column values - Pandas

Tags:

python

pandas

dataframe

mapping

jovicbg

People also ask

1 Answers

jpp

Recent Activity

Donate For Us