I'm currently having an issue while creating a dimension table named <code>payment_types_Owned</code> that lists the number Products that a customer has, plus their balances, and their limits on each payment. Currently, I have a table that looks like this: <pre class="prettyprint"> cust_id Payment Type X owned Payment Type Y owned Payment Type Z owned Credit Used_X Limit_X Credit Used_Y Limit_Y Credit Used_Z Limit_Z 0 Person_A 1 3 4 300 700 700 800 400 900 1 Person_B 2 1 3 400 600 100 150 400 500 2 Person_C 2 4 4 500 600 700 800 100 500</pre> My desired output: <pre class="prettyprint"> cust_id variable value Credit Used Limit 0 Person_A_key Payment Type X 1 300 700 1 Person_A_key Payment Type Y 3 700 800 2 Person_A_key Payment Type Z 4 400 900 3 Person_B_key Payment Type X 2 400 600 4 Person_B_key Payment Type Y 1 100 150 5 Person_B_key Payment Type Z 3 400 500 </pre> Assuming that I already have 2 other Dimension tables that capture the following information: <ol> <li> <code>Customer Dimension Table</code> - Contains cust_id Primary Keys</li> <li> <code>Product Dimension Table</code> - Contains the unique Product Primary Keys</li> </ol> Using <code>pd.melt()</code>, I get the below, but its only partly solving my problem: <pre class="prettyprint">(pd.melt(df, id_vars=['cust_id'], value_vars=['Payment Type X owned','Payment Type Y owned', 'Payment Type Z owned'])).sort_values(by=['cust_id']) </pre> <pre class="prettyprint"> cust_id variable value 0 Person_A Payment Type X 1 3 Person_A Payment Type Y 3 6 Person_A Payment Type Z 4 1 Person_B Payment Type X 2 4 Person_B Payment Type Y 1 7 Person_B Payment Type Z 3 2 Person_C Payment Type X 2 5 Person_C Payment Type Y 4 8 Person_C Payment Type Z 4 </pre> Any suggestions?

Use <code>wide_to_long</code>, but first is necessary use <code>Series.str.replace</code> with first group <code>Payment Type</code> columns: <pre class="prettyprint"><code>df.columns = df.columns.str.replace(' owned', '').str.replace('Payment Type ', 'Payment Type_') print (df) cust_id Payment Type_X Payment Type_Y Payment Type_Z Credit Used_X \ 0 Person_A 1 3 4 300 1 Person_B 2 1 3 400 2 Person_C 2 4 4 500 Limit_X Credit Used_Y Limit_Y Credit Used_Z Limit_Z 0 700 700 800 400 900 1 600 100 150 400 500 2 600 700 800 100 500 df1 = pd.wide_to_long(df, stubnames=['Payment Type','Credit Used', 'Limit'], i='cust_id', j='variable', sep='_', suffix='\w+').sort_index(level=0).reset_index() </code></pre> Last add string to <code>variable</code> column and rename column by dict: <pre class="prettyprint"><code>df1 = (df1.assign(variable='Payment Type ' + df1['variable']) .rename(columns={'Payment Type':'value'})) print(df1) cust_id variable value Credit Used Limit 0 Person_A Payment Type X 1 300 700 1 Person_A Payment Type Y 3 700 800 2 Person_A Payment Type Z 4 400 900 3 Person_B Payment Type X 2 400 600 4 Person_B Payment Type Y 1 100 150 5 Person_B Payment Type Z 3 400 500 6 Person_C Payment Type X 2 500 600 7 Person_C Payment Type Y 4 700 800 8 Person_C Payment Type Z 4 100 500 </code></pre>

how do I use pd.melt() across multiple columns?

Tags:

python-3.x

pandas

dataframe

numpy

I'm currently having an issue while creating a dimension table named payment_types_Owned that lists the number Products that a customer has, plus their balances, and their limits on each payment. Currently, I have a table that looks like this:

    cust_id  Payment Type X owned  Payment Type Y owned  Payment Type Z owned  Credit Used_X  Limit_X  Credit Used_Y  Limit_Y  Credit Used_Z  Limit_Z
0  Person_A                     1                     3                     4            300      700            700      800            400      900
1  Person_B                     2                     1                     3            400      600            100      150            400      500
2  Person_C                     2                     4                     4            500      600            700      800            100      500

My desired output:

        cust_id        variable  value  Credit Used  Limit
0  Person_A_key  Payment Type X      1          300    700
1  Person_A_key  Payment Type Y      3          700    800
2  Person_A_key  Payment Type Z      4          400    900
3  Person_B_key  Payment Type X      2          400    600
4  Person_B_key  Payment Type Y      1          100    150
5  Person_B_key  Payment Type Z      3          400    500

Assuming that I already have 2 other Dimension tables that capture the following information:

Customer Dimension Table - Contains cust_id Primary Keys
Product Dimension Table - Contains the unique Product Primary Keys

Using pd.melt(), I get the below, but its only partly solving my problem:

(pd.melt(df, id_vars=['cust_id'], value_vars=['Payment Type X owned','Payment Type Y owned', 'Payment Type Z owned'])).sort_values(by=['cust_id'])

    cust_id        variable  value
0  Person_A  Payment Type X      1
3  Person_A  Payment Type Y      3
6  Person_A  Payment Type Z      4
1  Person_B  Payment Type X      2
4  Person_B  Payment Type Y      1
7  Person_B  Payment Type Z      3
2  Person_C  Payment Type X      2
5  Person_C  Payment Type Y      4
8  Person_C  Payment Type Z      4

Any suggestions?

785

asked Sep 30 '19 13:09

eugenelor

1 Answers

Use wide_to_long, but first is necessary use Series.str.replace with first group Payment Type columns:

df.columns = df.columns.str.replace(' owned', '').str.replace('Payment Type ', 'Payment Type_')
print (df)
    cust_id  Payment Type_X  Payment Type_Y  Payment Type_Z  Credit Used_X  \
0  Person_A               1               3               4            300   
1  Person_B               2               1               3            400   
2  Person_C               2               4               4            500   

   Limit_X  Credit Used_Y  Limit_Y  Credit Used_Z  Limit_Z  
0      700            700      800            400      900  
1      600            100      150            400      500  
2      600            700      800            100      500  

df1 = pd.wide_to_long(df, stubnames=['Payment Type','Credit Used', 'Limit'], 
                      i='cust_id', 
                      j='variable', 
                      sep='_',
                      suffix='\w+').sort_index(level=0).reset_index()

Last add string to variable column and rename column by dict:

df1 = (df1.assign(variable='Payment Type ' + df1['variable'])
          .rename(columns={'Payment Type':'value'}))
print(df1)
    cust_id        variable  value  Credit Used  Limit
0  Person_A  Payment Type X      1          300    700
1  Person_A  Payment Type Y      3          700    800
2  Person_A  Payment Type Z      4          400    900
3  Person_B  Payment Type X      2          400    600
4  Person_B  Payment Type Y      1          100    150
5  Person_B  Payment Type Z      3          400    500
6  Person_C  Payment Type X      2          500    600
7  Person_C  Payment Type Y      4          700    800
8  Person_C  Payment Type Z      4          100    500

152

answered Sep 21 '22 14:09

jezrael

Related questions
                            
                                Is there a way to interrupt shutil copytree operation in Python?
                            
                                Removing multiple phrases from string column efficiently
                            
                                Applying a filter on an image with Python
                            
                                How to Insert a Node between another node in a Linked List?
                            
                                joining output from regex search
                            
                                Python: Calculating frequency over time from a wav file in Python?
                            
                                Detect if python program is executed via Windows GUI (double-click) vs command prompt
                            
                                Is it possible to add undirected and directed edges to a graph object in networkx?
                            
                                Reading Gmail Email in Python
                            
                                How to send and consume json messages using confluent-kafka in Python
                            
                                Kubernetes Python client connection Issue
                            
                                Group nodes together in networkx
                            
                                Does oversampling happen before or after cross-validation using imblearn pipelines?
                            
                                AttributeError: 'DataFrame' object has no attribute 'droplevel' in pandas
                            
                                How to have a mix of both Celery Executor and Kubernetes Executor in Apache Airflow?
                            
                                Install from pipfile using pipenv install gives error
                            
                                Read YAML file as list
                            
                                Unable to search a query with symbols in elasticsearch
                            
                                Multiprocessing code fails when run with pdb?
                            
                                Every product/combination of nested dictionaries saved to DataFrame

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With