Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how do I use pd.melt() across multiple columns?

I'm currently having an issue while creating a dimension table named payment_types_Owned that lists the number Products that a customer has, plus their balances, and their limits on each payment. Currently, I have a table that looks like this:

    cust_id  Payment Type X owned  Payment Type Y owned  Payment Type Z owned  Credit Used_X  Limit_X  Credit Used_Y  Limit_Y  Credit Used_Z  Limit_Z
0  Person_A                     1                     3                     4            300      700            700      800            400      900
1  Person_B                     2                     1                     3            400      600            100      150            400      500
2  Person_C                     2                     4                     4            500      600            700      800            100      500

My desired output:

        cust_id        variable  value  Credit Used  Limit
0  Person_A_key  Payment Type X      1          300    700
1  Person_A_key  Payment Type Y      3          700    800
2  Person_A_key  Payment Type Z      4          400    900
3  Person_B_key  Payment Type X      2          400    600
4  Person_B_key  Payment Type Y      1          100    150
5  Person_B_key  Payment Type Z      3          400    500

Assuming that I already have 2 other Dimension tables that capture the following information:

  1. Customer Dimension Table - Contains cust_id Primary Keys
  2. Product Dimension Table - Contains the unique Product Primary Keys

Using pd.melt(), I get the below, but its only partly solving my problem:

(pd.melt(df, id_vars=['cust_id'], value_vars=['Payment Type X owned','Payment Type Y owned', 'Payment Type Z owned'])).sort_values(by=['cust_id'])
    cust_id        variable  value
0  Person_A  Payment Type X      1
3  Person_A  Payment Type Y      3
6  Person_A  Payment Type Z      4
1  Person_B  Payment Type X      2
4  Person_B  Payment Type Y      1
7  Person_B  Payment Type Z      3
2  Person_C  Payment Type X      2
5  Person_C  Payment Type Y      4
8  Person_C  Payment Type Z      4


Any suggestions?

like image 785
eugenelor Avatar asked Sep 30 '19 13:09

eugenelor


People also ask

What does melt () do in Python?

Pandas melt() function is used to change the DataFrame format from wide to long. It's used to create a specific format of the DataFrame object where one or more columns work as identifiers. All the remaining columns are treated as values and unpivoted to the row axis and only two columns - variable and value.

How do I merge two PD data frames?

The concat() function can be used to concatenate two Dataframes by adding the rows of one to the other. The merge() function is equivalent to the SQL JOIN clause. 'left', 'right' and 'inner' joins are all possible.

What does DF melt do?

DataFrame - melt() function The melt() function is used to unpivot a given DataFrame from wide format to long format, optionally leaving identifier variables set. Column(s) to use as identifier variables. Column(s) to unpivot. If not specified, uses all columns that are not set as id_vars.


1 Answers

Use wide_to_long, but first is necessary use Series.str.replace with first group Payment Type columns:

df.columns = df.columns.str.replace(' owned', '').str.replace('Payment Type ', 'Payment Type_')
print (df)
    cust_id  Payment Type_X  Payment Type_Y  Payment Type_Z  Credit Used_X  \
0  Person_A               1               3               4            300   
1  Person_B               2               1               3            400   
2  Person_C               2               4               4            500   

   Limit_X  Credit Used_Y  Limit_Y  Credit Used_Z  Limit_Z  
0      700            700      800            400      900  
1      600            100      150            400      500  
2      600            700      800            100      500  

df1 = pd.wide_to_long(df, stubnames=['Payment Type','Credit Used', 'Limit'], 
                      i='cust_id', 
                      j='variable', 
                      sep='_',
                      suffix='\w+').sort_index(level=0).reset_index()

Last add string to variable column and rename column by dict:

df1 = (df1.assign(variable='Payment Type ' + df1['variable'])
          .rename(columns={'Payment Type':'value'}))
print(df1)
    cust_id        variable  value  Credit Used  Limit
0  Person_A  Payment Type X      1          300    700
1  Person_A  Payment Type Y      3          700    800
2  Person_A  Payment Type Z      4          400    900
3  Person_B  Payment Type X      2          400    600
4  Person_B  Payment Type Y      1          100    150
5  Person_B  Payment Type Z      3          400    500
6  Person_C  Payment Type X      2          500    600
7  Person_C  Payment Type Y      4          700    800
8  Person_C  Payment Type Z      4          100    500
like image 152
jezrael Avatar answered Sep 21 '22 14:09

jezrael