Pandas: Unpivot Excel Data Where Category and Children Labels Are In Same Column

Question

Basically, I think the easiest way to explain this is, I am trying expand a multi-indexed table, but the indexes are both in the same column.

My data is structured like this:

Row Labels	Sum
Collection1	22
Data 1	10
Data 2	12
Collection2	33
Data 1	33
Collection3	45
Data 1	14
Data 2	31
Total	100

What I would like out is a Dataframe like this:

Row Labels	Data1	Data2	Sum
Collection1	10	12	22
Collection2	33	0	33
Collection3	14	31	45
Total	57	43	100

Is there any built in pandas methods, or a straight forward approach to handling this type of translation?

I have tried manually breaking down the table and recreating it by collecting the row labels that are repeated, and making columns from them, with the data from rows with that label, but, the tricky spot is where child data is missing; like in the example above Collection2 Data2 doesn't exist. With this approach I could calculate for each row if Data1 equals Collection1 and if it does, add a 0 to Data2 at that index. But, it seems super ugly, and figured there is probably a much more elegant approach.

mozway · Accepted Answer

Using a pivot_table:

# identify groups
m = df['Row Labels'].str.match(r'Collection\d+|Total')

# reshape
out = (df
   .assign(index=df['Row Labels'].where(m).ffill(),
           col=df['Row Labels'].mask(m, 'Sum')
          )
   .pivot_table(index='index', columns='col', values='Sum', fill_value=0)
   .rename_axis(columns=None)
)

# recompute Total
out.loc['Total'] = out.drop('Total').sum()

out = out.reset_index()

Output:

         index  Data 1  Data 2  Sum
0  Collection1      10      12   22
1  Collection2      33       0   33
2  Collection3      14      31   45
3        Total      57      43  100

Andrej Kesely · Answer

I'm not sure if exists some straightforward Pandas solution, but you can try this example:

# remove the Total row - will recreate it as last step
df = df[df["Row Labels"] != "Total"]

# find the indices for pivoting
mask = df["Row Labels"].str.startswith("Collection")
df["idx"] = mask.cumsum()

# do the actual transformation here: pivot + merge
df = (
    pd.merge(
        df[mask],
        df[~mask].pivot(index="idx", columns="Row Labels", values="Sum"),
        left_on="idx",
        right_index=True,
    )
    .drop(columns=["idx"])
    .fillna(0)
)

# add Total row back
df = pd.concat(
    [
        df,
        pd.DataFrame(
            {"Row Labels": ["Total"], **{c: [df[c].sum()] for c in df.loc[:, "Sum":]}}
        ),
    ]
)

print(df)

Prints:

    Row Labels  Sum  Data 1  Data 2
0  Collection1   22    10.0    12.0
3  Collection2   33    33.0     0.0
5  Collection3   45    14.0    31.0
0        Total  100    57.0    43.0

Pandas: Unpivot Excel Data Where Category and Children Labels Are In Same Column

Tags:

python

pandas

dataframe

user3552225

2 Answers

mozway

Andrej Kesely

Recent Activity

Donate For Us

Pandas: Unpivot Excel Data Where Category and Children Labels Are In Same Column

Tags:

python

pandas

dataframe

user3552225

2 Answers

mozway

Andrej Kesely

Related questions

Recent Activity

Donate For Us