Pivot duplicates rows into new columns Pandas

Tags:

I have a data frame like this and I'm trying reshape my data frame using Pivot from Pandas in a way that I can keep some values from the original rows while making the duplicates row into columns and renaming them. Sometimes I have rows with 5 duplicates

I have been trying, but I don't get it.

import pandas as pd
df = pd.read_csv("C:dummy")

df = df.pivot(index=["ID"], columns=["Zone","PTC"], values=["Zone","PTC"])

# Rename columns and reset the index.
df.columns = [["PTC{}","Zone{}"],.format(c) for c in df.columns]
df.reset_index(inplace=True)
# Drop duplicates
df.drop(["PTC","Zone"], axis=1, inplace=True)

Input

ID  Agent   OV  Zone Value  PTC
1   10      26   M1   10    100
2   26.5    8    M2   50    95
2   26.5    8    M1   6     5
3   4.5     6    M3   4     40
3   4.5     6    M4   6     60
4   1.2    0.8   M1   8     100
5   2      0.4   M1   6     10
5   2      0.4   M2   41    86
5   2      0.4   M4   2     4

Output

ID  Agent   OV  Zone1   Value1  PTC1    Zone2   Value2  PTC2    Zone3   Value3  PTC3
1   10      26  M_1     10       100    0          0      0      0         0      0
2   26.5    8   M_2     50        95    M_1        6      5      0         0      0
3   4.5     6   M_3     4         40    M_4        6     60      0         0      0
4   1.2    0.8  M_1     8        100    0          0      0      0         0      0
5   2      0.4  M_1     6         10    M_2        41    86     M_4        2      4

629

asked Jun 25 '18 14:06

Zesima29

2 Answers

Use cumcount for count groups, create MultiIndex by set_index with unstack and last flatten values of columns:

g = df.groupby(["ID","Agent", "OV"]).cumcount().add(1)
df = df.set_index(["ID","Agent","OV", g]).unstack(fill_value=0).sort_index(axis=1, level=1)
df.columns = ["{}{}".format(a, b) for a, b in df.columns]

df = df.reset_index()
print (df)
   ID  Agent    OV Zone1  Value1  PTC1 Zone2  Value2  PTC2 Zone3  Value3  PTC3
0   1   10.0  26.0    M1      10   100     0       0     0     0       0     0
1   2   26.5   8.0    M2      50    95    M1       6     5     0       0     0
2   3    4.5   6.0    M3       4    40    M4       6    60     0       0     0
3   4    1.2   0.8    M1       8   100     0       0     0     0       0     0
4   5    2.0   0.4    M1       6    10    M2      41    86    M4       2     4

If want replace to 0 only numeric columns:

g = df.groupby(["ID","Agent"]).cumcount().add(1)
df = df.set_index(["ID","Agent","OV", g]).unstack().sort_index(axis=1, level=1)

idx = pd.IndexSlice
df.loc[:, idx[['Value','PTC']]] = df.loc[:, idx[['Value','PTC']]].fillna(0).astype(int)
df.columns = ["{}{}".format(a, b) for a, b in df.columns]

df = df.fillna('').reset_index()
print (df)
   ID  Agent    OV Zone1  Value1  PTC1 Zone2  Value2  PTC2 Zone3  Value3  PTC3
0   1   10.0  26.0    M1      10   100             0     0             0     0
1   2   26.5   8.0    M2      50    95    M1       6     5             0     0
2   3    4.5   6.0    M3       4    40    M4       6    60             0     0
3   4    1.2   0.8    M1       8   100             0     0             0     0
4   5    2.0   0.4    M1       6    10    M2      41    86    M4       2     4

157

answered Oct 17 '22 04:10

jezrael

You can using cumcount create the help key , then we do unstack with multiple index flatten (PS : you can add fillna(0) at the end , I did not add it cause I do not think for Zone value 0 is correct )

df['New']=df.groupby(['ID','Agent','OV']).cumcount()+1
new_df=df.set_index(['ID','Agent','OV','New']).unstack('New').sort_index(axis=1 , level=1)
new_df.columns=new_df.columns.map('{0[0]}{0[1]}'.format) 
new_df
Out[40]: 
              Zone1  Value1   PTC1 Zone2  Value2  PTC2 Zone3  Value3  PTC3
ID Agent OV                                                               
1  10.0  26.0    M1    10.0  100.0  None     NaN   NaN  None     NaN   NaN
2  26.5  8.0     M2    50.0   95.0    M1     6.0   5.0  None     NaN   NaN
3  4.5   6.0     M3     4.0   40.0    M4     6.0  60.0  None     NaN   NaN
4  1.2   0.8     M1     8.0  100.0  None     NaN   NaN  None     NaN   NaN
5  2.0   0.4     M1     6.0   10.0    M2    41.0  86.0    M4     2.0   4.0

answered Oct 17 '22 03:10

BENY

Related questions
                            
                                Params for functions in jupyter lab w/ Python
                            
                                How to convert CIDR to IP ranges using python3?
                            
                                Tensorflow parsing and reshaping float list in Dataset.map()
                            
                                Reenable urllib3 warnings
                            
                                How to select top n row from each group after group by in pandas?
                            
                                Raise close spider from Scrapy pipeline
                            
                                urlparse fails with simple url
                            
                                Is there a function in google.colab module to close the runtime
                            
                                Pythonic way to hold related variables?
                            
                                pytest: how to use a mark to inject a fixture?
                            
                                Selenium Python - Get a list of all loaded URLs (images, scripts, stylesheets etc)
                            
                                what is the entry point to python source code
                            
                                python exception handling inside with block
                            
                                Why the following operands could not be broadcasted together?
                            
                                Parse a dataframe column by comma and pivot - python
                            
                                custom scaling of wind rose python
                            
                                How can I calculate pct_change() in pandas across two columns, row by row?
                            
                                Dataframe Join Null-Safe Condition Use
                            
                                Pandas resampling from months to weeks
                            
                                Ansible fails to find boto3 and botocore although installed

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pivot duplicates rows into new columns Pandas

Tags:

python

merge

pandas

duplicates

Zesima29

People also ask

2 Answers

jezrael

BENY

Recent Activity

Donate For Us