Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pivot duplicates rows into new columns Pandas

I have a data frame like this and I'm trying reshape my data frame using Pivot from Pandas in a way that I can keep some values from the original rows while making the duplicates row into columns and renaming them. Sometimes I have rows with 5 duplicates

I have been trying, but I don't get it.

import pandas as pd
df = pd.read_csv("C:dummy")

df = df.pivot(index=["ID"], columns=["Zone","PTC"], values=["Zone","PTC"])

# Rename columns and reset the index.
df.columns = [["PTC{}","Zone{}"],.format(c) for c in df.columns]
df.reset_index(inplace=True)
# Drop duplicates
df.drop(["PTC","Zone"], axis=1, inplace=True)

Input

ID  Agent   OV  Zone Value  PTC
1   10      26   M1   10    100
2   26.5    8    M2   50    95
2   26.5    8    M1   6     5
3   4.5     6    M3   4     40
3   4.5     6    M4   6     60
4   1.2    0.8   M1   8     100
5   2      0.4   M1   6     10
5   2      0.4   M2   41    86
5   2      0.4   M4   2     4

Output

ID  Agent   OV  Zone1   Value1  PTC1    Zone2   Value2  PTC2    Zone3   Value3  PTC3
1   10      26  M_1     10       100    0          0      0      0         0      0
2   26.5    8   M_2     50        95    M_1        6      5      0         0      0
3   4.5     6   M_3     4         40    M_4        6     60      0         0      0
4   1.2    0.8  M_1     8        100    0          0      0      0         0      0
5   2      0.4  M_1     6         10    M_2        41    86     M_4        2      4
like image 629
Zesima29 Avatar asked Jun 25 '18 14:06

Zesima29


People also ask

How do I get rid of duplicate rows in Pandas?

Pandas drop_duplicates() Function Syntax keep: allowed values are {'first', 'last', False}, default 'first'. If 'first', duplicate rows except the first one is deleted. If 'last', duplicate rows except the last one is deleted. If False, all the duplicate rows are deleted.

How do you drop duplicate rows in Pandas based on multiple columns?

By using pandas. DataFrame. drop_duplicates() method you can remove duplicate rows from DataFrame. Using this method you can drop duplicate rows on selected multiple columns or all columns.

How do I convert rows to columns in Pandas?

The transpose() function is used to transpose index and columns. Reflect the DataFrame over its main diagonal by writing rows as columns and vice-versa. If True, the underlying data is copied.

How do I avoid duplicates in Pandas?

To remove duplicates on specific column(s), use subset . To remove duplicates and keep last occurrences, use keep .


2 Answers

Use cumcount for count groups, create MultiIndex by set_index with unstack and last flatten values of columns:

g = df.groupby(["ID","Agent", "OV"]).cumcount().add(1)
df = df.set_index(["ID","Agent","OV", g]).unstack(fill_value=0).sort_index(axis=1, level=1)
df.columns = ["{}{}".format(a, b) for a, b in df.columns]

df = df.reset_index()
print (df)
   ID  Agent    OV Zone1  Value1  PTC1 Zone2  Value2  PTC2 Zone3  Value3  PTC3
0   1   10.0  26.0    M1      10   100     0       0     0     0       0     0
1   2   26.5   8.0    M2      50    95    M1       6     5     0       0     0
2   3    4.5   6.0    M3       4    40    M4       6    60     0       0     0
3   4    1.2   0.8    M1       8   100     0       0     0     0       0     0
4   5    2.0   0.4    M1       6    10    M2      41    86    M4       2     4

If want replace to 0 only numeric columns:

g = df.groupby(["ID","Agent"]).cumcount().add(1)
df = df.set_index(["ID","Agent","OV", g]).unstack().sort_index(axis=1, level=1)

idx = pd.IndexSlice
df.loc[:, idx[['Value','PTC']]] = df.loc[:, idx[['Value','PTC']]].fillna(0).astype(int)
df.columns = ["{}{}".format(a, b) for a, b in df.columns]

df = df.fillna('').reset_index()
print (df)
   ID  Agent    OV Zone1  Value1  PTC1 Zone2  Value2  PTC2 Zone3  Value3  PTC3
0   1   10.0  26.0    M1      10   100             0     0             0     0
1   2   26.5   8.0    M2      50    95    M1       6     5             0     0
2   3    4.5   6.0    M3       4    40    M4       6    60             0     0
3   4    1.2   0.8    M1       8   100             0     0             0     0
4   5    2.0   0.4    M1       6    10    M2      41    86    M4       2     4
like image 157
jezrael Avatar answered Oct 17 '22 04:10

jezrael


You can using cumcount create the help key , then we do unstack with multiple index flatten (PS : you can add fillna(0) at the end , I did not add it cause I do not think for Zone value 0 is correct )

df['New']=df.groupby(['ID','Agent','OV']).cumcount()+1
new_df=df.set_index(['ID','Agent','OV','New']).unstack('New').sort_index(axis=1 , level=1)
new_df.columns=new_df.columns.map('{0[0]}{0[1]}'.format) 
new_df
Out[40]: 
              Zone1  Value1   PTC1 Zone2  Value2  PTC2 Zone3  Value3  PTC3
ID Agent OV                                                               
1  10.0  26.0    M1    10.0  100.0  None     NaN   NaN  None     NaN   NaN
2  26.5  8.0     M2    50.0   95.0    M1     6.0   5.0  None     NaN   NaN
3  4.5   6.0     M3     4.0   40.0    M4     6.0  60.0  None     NaN   NaN
4  1.2   0.8     M1     8.0  100.0  None     NaN   NaN  None     NaN   NaN
5  2.0   0.4     M1     6.0   10.0    M2    41.0  86.0    M4     2.0   4.0
like image 2
BENY Avatar answered Oct 17 '22 03:10

BENY