I'm currently having an issue while creating a dimension table named payment_types_Owned
that lists the number Products that a customer has, plus their balances, and their limits on each payment. Currently, I have a table that looks like this:
cust_id Payment Type X owned Payment Type Y owned Payment Type Z owned Credit Used_X Limit_X Credit Used_Y Limit_Y Credit Used_Z Limit_Z 0 Person_A 1 3 4 300 700 700 800 400 900 1 Person_B 2 1 3 400 600 100 150 400 500 2 Person_C 2 4 4 500 600 700 800 100 500
My desired output:
cust_id variable value Credit Used Limit 0 Person_A_key Payment Type X 1 300 700 1 Person_A_key Payment Type Y 3 700 800 2 Person_A_key Payment Type Z 4 400 900 3 Person_B_key Payment Type X 2 400 600 4 Person_B_key Payment Type Y 1 100 150 5 Person_B_key Payment Type Z 3 400 500
Assuming that I already have 2 other Dimension tables that capture the following information:
Customer Dimension Table
- Contains cust_id Primary KeysProduct Dimension Table
- Contains the unique Product Primary KeysUsing pd.melt()
, I get the below, but its only partly solving my problem:
(pd.melt(df, id_vars=['cust_id'], value_vars=['Payment Type X owned','Payment Type Y owned', 'Payment Type Z owned'])).sort_values(by=['cust_id'])
cust_id variable value 0 Person_A Payment Type X 1 3 Person_A Payment Type Y 3 6 Person_A Payment Type Z 4 1 Person_B Payment Type X 2 4 Person_B Payment Type Y 1 7 Person_B Payment Type Z 3 2 Person_C Payment Type X 2 5 Person_C Payment Type Y 4 8 Person_C Payment Type Z 4
Any suggestions?
Pandas melt() function is used to change the DataFrame format from wide to long. It's used to create a specific format of the DataFrame object where one or more columns work as identifiers. All the remaining columns are treated as values and unpivoted to the row axis and only two columns - variable and value.
The concat() function can be used to concatenate two Dataframes by adding the rows of one to the other. The merge() function is equivalent to the SQL JOIN clause. 'left', 'right' and 'inner' joins are all possible.
DataFrame - melt() function The melt() function is used to unpivot a given DataFrame from wide format to long format, optionally leaving identifier variables set. Column(s) to use as identifier variables. Column(s) to unpivot. If not specified, uses all columns that are not set as id_vars.
Use wide_to_long
, but first is necessary use Series.str.replace
with first group Payment Type
columns:
df.columns = df.columns.str.replace(' owned', '').str.replace('Payment Type ', 'Payment Type_')
print (df)
cust_id Payment Type_X Payment Type_Y Payment Type_Z Credit Used_X \
0 Person_A 1 3 4 300
1 Person_B 2 1 3 400
2 Person_C 2 4 4 500
Limit_X Credit Used_Y Limit_Y Credit Used_Z Limit_Z
0 700 700 800 400 900
1 600 100 150 400 500
2 600 700 800 100 500
df1 = pd.wide_to_long(df, stubnames=['Payment Type','Credit Used', 'Limit'],
i='cust_id',
j='variable',
sep='_',
suffix='\w+').sort_index(level=0).reset_index()
Last add string to variable
column and rename column by dict:
df1 = (df1.assign(variable='Payment Type ' + df1['variable'])
.rename(columns={'Payment Type':'value'}))
print(df1)
cust_id variable value Credit Used Limit
0 Person_A Payment Type X 1 300 700
1 Person_A Payment Type Y 3 700 800
2 Person_A Payment Type Z 4 400 900
3 Person_B Payment Type X 2 400 600
4 Person_B Payment Type Y 1 100 150
5 Person_B Payment Type Z 3 400 500
6 Person_C Payment Type X 2 500 600
7 Person_C Payment Type Y 4 700 800
8 Person_C Payment Type Z 4 100 500
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With