I have a problem with mapping values from another dataframe.
These are samples of two dataframes:
df1
product class_1 class_2 class_3
141A 11 13 5
53F4 12 11 18
GS24 14 12 10
df2
id product_type_0 product_type_1 product_type_2 product_type_3 measure_0 measure_1 measure_2 measure_3
1 141A GS24 NaN NaN 1 3 NaN NaN
2 53F4 NaN NaN NaN 1 NaN NaN NaN
3 53F4 141A 141A NaN 2 2 1 NaN
4 141A GS24 NaN NaN 3 2 NaN NaN
What I'm trying to get is next: I need to add a new columns called "Max_Class_1", "Max_Class_2", "Max_Class_3" and that value would be taken from df1. For each order number (_1, _2, _3) look at existing columns (for example product_type_1) product_type_1 and take a row from df1 where the product has the same value. Then look at the measure columns (for example measure_1) and if the value is 1 (it's possible max four different values in original data), new column called "Max_Class_1" would have value same as class_1 for that product_type, in this case 11.
I think it's a little bit simpler than I explained it.
Desired output
id product_type_0 product_type_1 product_type_2 product_type_3 measure_0 measure_1 measure_2 measure_3 max_class_0 max_class_1 max_class_2 max_class_3
1 141A GS24 NaN NaN 1 3 NaN NaN 1 10 NaN NaN
2 53F4 NaN NaN NaN 1 NaN NaN NaN 12 NaN NaN NaN
3 53F4 141A 141A NaN 2 2 1 NaN 11 13 11 NaN
4 141A GS24 NaN NaN 3 2 NaN NaN 5 12 NaN NaN
The code I have tried with:
df2['max_class_1'] = None
df2['max_class_2'] = None
df2['max_class_3'] = None
def get_max_class(product_df, measure_df, product_type_column, measure_column, max_class_columns):
for index, row in measure_df.iterrows():
product_df_new = product_df[product_df['product'] == row[product_type_column]]
for ind, r in product_df_new.iterrows():
if row[measure_column] == 1:
row[max_class_columns] = r['class_1']
elif row[measure_column] == 2:
row[max_class_columns] = r['class_2']
elif row[measure_column] == 3:
row[max_class_columns] = r['class_3']
else:
row[tilt_column] = "There is no measure or type"
return measure_df
# And the function call
first_class = get_max_class(product_df=df1, measure_df=df2, product_type_column=product_type_1, measure_column='measure_1', max_class_columns='max_class_1')
second_class = get_max_class(product_df=df1, measure_df=first_class, product_type_column=product_type_2, measure_column='measure_2', max_class_columns='max_class_2')
third_class = get_max_class(product_df=df1, measure_df=second_class, product_type_column=product_type_3, measure_column='measure_3', max_class_columns='max_class_3')
I'm pretty sure there is a simpler solution, but don't know why is not working. I'm getting all None values, nothing changes.
You can extract a column of pandas DataFrame based on another value by using the DataFrame. query() method. The query() is used to query the columns of a DataFrame with a boolean expression.
pandas map() Key Points – You can use this to perform operations on a specific column of a DataFrame as each column in a DataFrame is Series. map() when passed a dictionary/Series will map elements based on the keys in that dictionary/Series.
pd.DataFrame.lookup
is the standard method for lookups by row and column labels.
Your problem is complicated by the existence of null values. But this can be accommodated by modifying your input mapping dataframe.
Step 1
Rename columns in df1
to integers and add an extra row / column. We will use the added data later to deal with null values.
def rename_cols(x):
return x if not x.startswith('class') else int(x.split('_')[-1])
df1 = df1.rename(columns=rename_cols)
df1 = df1.set_index('product')
df1.loc['X'] = 0
df1[0] = 0
Your mapping dataframe now looks like:
print(df1)
1 2 3 0
product
141A 11 13 5 0
53F4 12 11 18 0
GS24 14 12 10 0
X 0 0 0 0
Step 2
Iterate the number of categories and use pd.DataFrame.lookup
. Notice how we fillna
with X
and 0
, exactly what we used for additional mapping data in Step 1.
n = df2.columns.str.startswith('measure').sum()
for i in range(n):
rows = df2['product_type_{}'.format(i)].fillna('X')
cols = df2['measure_{}'.format(i)].fillna(0).astype(int)
df2['max_{}'.format(i)] = df1.lookup(rows, cols)
Result
print(df2)
id product_type_0 product_type_1 product_type_2 product_type_3 measure_0 \
0 1 141A GS24 NaN NaN 1
1 2 53F4 NaN NaN NaN 1
2 3 53F4 141A 141A NaN 2
3 4 141A GS24 NaN NaN 3
measure_1 measure_2 measure_3 max_0 max_1 max_2 max_3
0 3.0 NaN NaN 11 10 0 0
1 NaN NaN NaN 12 0 0 0
2 2.0 1.0 NaN 11 13 11 0
3 2.0 NaN NaN 5 12 0 0
You can convert the 0
to np.nan
if required. This will be at the expense of converting your series from int
to float
, since NaN
is considered float
.
Of course, if X
and 0
are valid values, you can use alternative filler values from the start.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With