Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Map values from one dataframe to new columns in other based on column values - Pandas

I have a problem with mapping values from another dataframe.

These are samples of two dataframes:

df1

product   class_1   class_2   class_3
141A        11        13         5     
53F4        12        11        18  
GS24        14        12        10   

df2

id    product_type_0  product_type_1 product_type_2  product_type_3 measure_0 measure_1 measure_2   measure_3
1         141A            GS24             NaN           NaN          1         3           NaN       NaN
2         53F4            NaN              NaN           NaN          1        NaN          NaN       NaN
3         53F4            141A             141A          NaN          2         2            1        NaN
4         141A            GS24             NaN           NaN          3         2           NaN       NaN

What I'm trying to get is next: I need to add a new columns called "Max_Class_1", "Max_Class_2", "Max_Class_3" and that value would be taken from df1. For each order number (_1, _2, _3) look at existing columns (for example product_type_1) product_type_1 and take a row from df1 where the product has the same value. Then look at the measure columns (for example measure_1) and if the value is 1 (it's possible max four different values in original data), new column called "Max_Class_1" would have value same as class_1 for that product_type, in this case 11.

I think it's a little bit simpler than I explained it.

Desired output

id    product_type_0  product_type_1 product_type_2  product_type_3  measure_0 measure_1 measure_2  measure_3  max_class_0  max_class_1  max_class_2  max_class_3
1         141A            GS24             NaN         NaN            1         3         NaN        NaN        1           10            NaN NaN
2         53F4            NaN              NaN         NaN            1        NaN        NaN        NaN        12         NaN           NaN  NaN
3         53F4            141A             141A        NaN            2         2         1          NaN        11          13            11  NaN
4         141A            GS24             NaN         NaN            3         2         NaN        NaN        5           12            NaN NaN

The code I have tried with:

df2['max_class_1'] = None
df2['max_class_2'] = None
df2['max_class_3'] = None

def get_max_class(product_df, measure_df, product_type_column, measure_column, max_class_columns):
    for index, row in measure_df.iterrows():
        product_df_new = product_df[product_df['product'] == row[product_type_column]]
        for ind, r in product_df_new.iterrows():
            if row[measure_column] == 1:
                row[max_class_columns] = r['class_1']
            elif row[measure_column] == 2:
                row[max_class_columns] = r['class_2']
            elif row[measure_column] == 3:
                row[max_class_columns] = r['class_3']
            else:
                row[tilt_column] = "There is no measure or type"
    return measure_df

# And the function call 
first_class = get_max_class(product_df=df1, measure_df=df2, product_type_column=product_type_1, measure_column='measure_1', max_class_columns='max_class_1')

second_class = get_max_class(product_df=df1, measure_df=first_class, product_type_column=product_type_2, measure_column='measure_2', max_class_columns='max_class_2')

third_class = get_max_class(product_df=df1, measure_df=second_class, product_type_column=product_type_3, measure_column='measure_3', max_class_columns='max_class_3')

I'm pretty sure there is a simpler solution, but don't know why is not working. I'm getting all None values, nothing changes.

like image 630
jovicbg Avatar asked Jul 10 '18 13:07

jovicbg


People also ask

How do I get the value of a column based on another column value?

You can extract a column of pandas DataFrame based on another value by using the DataFrame. query() method. The query() is used to query the columns of a DataFrame with a boolean expression.

What pandas method will you use to map columns between two data frames *?

pandas map() Key Points – You can use this to perform operations on a specific column of a DataFrame as each column in a DataFrame is Series. map() when passed a dictionary/Series will map elements based on the keys in that dictionary/Series.


1 Answers

pd.DataFrame.lookup is the standard method for lookups by row and column labels.

Your problem is complicated by the existence of null values. But this can be accommodated by modifying your input mapping dataframe.

Step 1

Rename columns in df1 to integers and add an extra row / column. We will use the added data later to deal with null values.

def rename_cols(x):
    return x if not x.startswith('class') else int(x.split('_')[-1])

df1 = df1.rename(columns=rename_cols)

df1 = df1.set_index('product')
df1.loc['X'] = 0
df1[0] = 0

Your mapping dataframe now looks like:

print(df1)

          1   2   3  0
product               
141A     11  13   5  0
53F4     12  11  18  0
GS24     14  12  10  0
X         0   0   0  0

Step 2

Iterate the number of categories and use pd.DataFrame.lookup. Notice how we fillna with X and 0, exactly what we used for additional mapping data in Step 1.

n = df2.columns.str.startswith('measure').sum()

for i in range(n):
    rows = df2['product_type_{}'.format(i)].fillna('X')
    cols = df2['measure_{}'.format(i)].fillna(0).astype(int)
    df2['max_{}'.format(i)] = df1.lookup(rows, cols)

Result

print(df2)

   id product_type_0 product_type_1 product_type_2  product_type_3  measure_0  \
0   1           141A           GS24            NaN             NaN          1   
1   2           53F4            NaN            NaN             NaN          1   
2   3           53F4           141A           141A             NaN          2   
3   4           141A           GS24            NaN             NaN          3   

   measure_1  measure_2  measure_3  max_0  max_1  max_2  max_3  
0        3.0        NaN        NaN     11     10      0      0  
1        NaN        NaN        NaN     12      0      0      0  
2        2.0        1.0        NaN     11     13     11      0  
3        2.0        NaN        NaN      5     12      0      0  

You can convert the 0 to np.nan if required. This will be at the expense of converting your series from int to float, since NaN is considered float.

Of course, if X and 0 are valid values, you can use alternative filler values from the start.

like image 125
jpp Avatar answered Oct 12 '22 13:10

jpp