Basically, I think the easiest way to explain this is, I am trying expand a multi-indexed table, but the indexes are both in the same column.
My data is structured like this:
| Row Labels | Sum | 
|---|---|
| Collection1 | 22 | 
| Data 1 | 10 | 
| Data 2 | 12 | 
| Collection2 | 33 | 
| Data 1 | 33 | 
| Collection3 | 45 | 
| Data 1 | 14 | 
| Data 2 | 31 | 
| Total | 100 | 
What I would like out is a Dataframe like this:
| Row Labels | Data1 | Data2 | Sum | 
|---|---|---|---|
| Collection1 | 10 | 12 | 22 | 
| Collection2 | 33 | 0 | 33 | 
| Collection3 | 14 | 31 | 45 | 
| Total | 57 | 43 | 100 | 
Is there any built in pandas methods, or a straight forward approach to handling this type of translation?
I have tried manually breaking down the table and recreating it by collecting the row labels that are repeated, and making columns from them, with the data from rows with that label, but, the tricky spot is where child data is missing; like in the example above Collection2 Data2 doesn't exist. With this approach I could calculate for each row if Data1 equals Collection1 and if it does, add a 0 to Data2 at that index. But, it seems super ugly, and figured there is probably a much more elegant approach.
Using a pivot_table:
# identify groups
m = df['Row Labels'].str.match(r'Collection\d+|Total')
# reshape
out = (df
   .assign(index=df['Row Labels'].where(m).ffill(),
           col=df['Row Labels'].mask(m, 'Sum')
          )
   .pivot_table(index='index', columns='col', values='Sum', fill_value=0)
   .rename_axis(columns=None)
)
# recompute Total
out.loc['Total'] = out.drop('Total').sum()
out = out.reset_index()
Output:
         index  Data 1  Data 2  Sum
0  Collection1      10      12   22
1  Collection2      33       0   33
2  Collection3      14      31   45
3        Total      57      43  100
                        I'm not sure if exists some straightforward Pandas solution, but you can try this example:
# remove the Total row - will recreate it as last step
df = df[df["Row Labels"] != "Total"]
# find the indices for pivoting
mask = df["Row Labels"].str.startswith("Collection")
df["idx"] = mask.cumsum()
# do the actual transformation here: pivot + merge
df = (
    pd.merge(
        df[mask],
        df[~mask].pivot(index="idx", columns="Row Labels", values="Sum"),
        left_on="idx",
        right_index=True,
    )
    .drop(columns=["idx"])
    .fillna(0)
)
# add Total row back
df = pd.concat(
    [
        df,
        pd.DataFrame(
            {"Row Labels": ["Total"], **{c: [df[c].sum()] for c in df.loc[:, "Sum":]}}
        ),
    ]
)
print(df)
Prints:
    Row Labels  Sum  Data 1  Data 2
0  Collection1   22    10.0    12.0
3  Collection2   33    33.0     0.0
5  Collection3   45    14.0    31.0
0        Total  100    57.0    43.0
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With