How can I insert into a specific location of a MultiIndex DataFrame?

Question

Suppose I have a pandas DataFrame that looks similar to the following in structure. However in practice it might be much larger and the number of level 1 indexes, as well as the number of level 2 index (per level 1 index) will vary, so the solution shouldn't make assumptions about this:

index = pandas.MultiIndex.from_tuples([
    ("a", "s"),
    ("a", "u"),
    ("a", "v"),
    ("b", "s"),
    ("b", "u")])

result = pandas.DataFrame([
    [1, 2],
    [3, 4],
    [5, 6],
    [7, 8],
    [9, 10]], index=index, columns=["x", "y"])

Which looks like this:

      x   y
a s   1   2
  u   3   4
  v   5   6
b s   7   8
  u   9  10

Now let's say I want to create a "total" row for each of the "a" and "b" levels. So given the above as input I would want my code to produce something like this:

      x   y
a s   1   2
  u   3   4
  v   5   6
  t   9  12
b s   7   8
  u   9  10
b t  16  18

Here's the code I have so far:

# Calculate totals
for level, _ in result.groupby(level=0):

    # work out the global total for that desk:
    x_sum = result.loc[level]["x"].sum()
    y_sum = result.loc[level]["y"].sum()

    result = result.append(pandas.DataFrame([[x_sum, y_sum]], columns=result.columns, index=pandas.MultiIndex.from_tuples([(level, "t")])))

But this results in the "total" columns being appended to the end:

      x   y
a s   1   2
  u   3   4
  v   5   6
b s   7   8
  u   9  10
a t   9  12
b t  16  18

Sorting using result.sort_index() doesn't do what I want either:

      x   y
a s   1   2
  t   9  12
  u   3   4
  v   5   6
b s   7   8
  t  16  18
  u   9  10

What am I doing wrong?

jezrael · Accepted Answer

It is really annoyning, but reason for sorted Multiindex is better performance. Also if not sorted MultiIndex is possible some UnsortedIndexError if need select by MultiIndex.

But if really need change positions of labels is possible use reindex.

df = result.groupby(level=0).sum()
df.index = [df.index, ['t'] * len(df.index)]
df1 = pd.concat([result, df]).sort_index().reindex(['s','u','t'], level=1)

df1 = pd.concat([result, df]).sort_index()
print (df1)
      x   y
a s   1   2
  t   4   6
  u   3   4
b s   5   6
  t  12  14
  u   7   8

df1 = df1.reindex(['s','u','t'], level=1)
print (df1)
      x   y
a s   1   2
  u   3   4
  t   4   6
b s   5   6
  u   7   8
  t  12  14

More dynamic solution:

print (result.index.get_level_values(1).unique().tolist())
['s', 'u']

df1 = df1.reindex(result.index.get_level_values(1).unique().tolist() + ['t'], level=1)
print (df1)
      x   y
a s   1   2
  u   3   4
  t   4   6
b s   5   6
  u   7   8
  t  12  14

Another solution with setting with enlargement in custom function with GroupBy.apply:

def f(x):
    x.loc[(x.name, 't'),:] = x.sum()
    return x   

df = result.groupby(level=0, group_keys=False).apply(f)
print (df)
        x     y
a s   1.0   2.0
  u   3.0   4.0
  t   4.0   6.0
b s   5.0   6.0
  u   7.0   8.0
  t  12.0  14.0

How can I insert into a specific location of a MultiIndex DataFrame?

Tags:

python

pandas

quant

1 Answers

jezrael

Recent Activity

Donate For Us

How can I insert into a specific location of a MultiIndex DataFrame?

Tags:

python

pandas

quant

1 Answers

jezrael

Related questions

Recent Activity

Donate For Us