Suppose I have a pandas DataFrame that looks similar to the following in structure. However in practice it might be much larger and the number of level 1 indexes, as well as the number of level 2 index (per level 1 index) will vary, so the solution shouldn't make assumptions about this:
index = pandas.MultiIndex.from_tuples([
("a", "s"),
("a", "u"),
("a", "v"),
("b", "s"),
("b", "u")])
result = pandas.DataFrame([
[1, 2],
[3, 4],
[5, 6],
[7, 8],
[9, 10]], index=index, columns=["x", "y"])
Which looks like this:
x y
a s 1 2
u 3 4
v 5 6
b s 7 8
u 9 10
Now let's say I want to create a "total" row for each of the "a" and "b" levels. So given the above as input I would want my code to produce something like this:
x y
a s 1 2
u 3 4
v 5 6
t 9 12
b s 7 8
u 9 10
b t 16 18
Here's the code I have so far:
# Calculate totals
for level, _ in result.groupby(level=0):
# work out the global total for that desk:
x_sum = result.loc[level]["x"].sum()
y_sum = result.loc[level]["y"].sum()
result = result.append(pandas.DataFrame([[x_sum, y_sum]], columns=result.columns, index=pandas.MultiIndex.from_tuples([(level, "t")])))
But this results in the "total" columns being appended to the end:
x y
a s 1 2
u 3 4
v 5 6
b s 7 8
u 9 10
a t 9 12
b t 16 18
Sorting using result.sort_index()
doesn't do what I want either:
x y
a s 1 2
t 9 12
u 3 4
v 5 6
b s 7 8
t 16 18
u 9 10
What am I doing wrong?
It is really annoyning, but reason for sorted Multiindex
is better performance. Also if not sorted MultiIndex
is possible some UnsortedIndexError if need select by MultiIndex
.
But if really need change positions of labels is possible use reindex
.
df = result.groupby(level=0).sum()
df.index = [df.index, ['t'] * len(df.index)]
df1 = pd.concat([result, df]).sort_index().reindex(['s','u','t'], level=1)
df1 = pd.concat([result, df]).sort_index()
print (df1)
x y
a s 1 2
t 4 6
u 3 4
b s 5 6
t 12 14
u 7 8
df1 = df1.reindex(['s','u','t'], level=1)
print (df1)
x y
a s 1 2
u 3 4
t 4 6
b s 5 6
u 7 8
t 12 14
More dynamic solution:
print (result.index.get_level_values(1).unique().tolist())
['s', 'u']
df1 = df1.reindex(result.index.get_level_values(1).unique().tolist() + ['t'], level=1)
print (df1)
x y
a s 1 2
u 3 4
t 4 6
b s 5 6
u 7 8
t 12 14
Another solution with setting with enlargement in custom function with GroupBy.apply
:
def f(x):
x.loc[(x.name, 't'),:] = x.sum()
return x
df = result.groupby(level=0, group_keys=False).apply(f)
print (df)
x y
a s 1.0 2.0
u 3.0 4.0
t 4.0 6.0
b s 5.0 6.0
u 7.0 8.0
t 12.0 14.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With