Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Iteratively concatenate pandas dataframe with multiindex

I am iteratively processing a couple of "groups" and I would like to add them together to a dataframe with every group being identified by a 2nd level index.

This:

print pd.concat([df1, df2, df3], keys=["A", "B", "C"])

was suggested to me - but it doesn't play well with iteration.

I am currently doing

data_all = pd.DataFrame([])
    for a in a_list:
        group = some.function(a, etc)
        group = group.set_index(['CoI'], append=True, drop=True)
        group = group.reorder_levels(['CoI','oldindex'])
        data_all = pd.concat([data_all, group], ignore_index=False)

But the last line totally destroys my multi-index and I cannot reconstruct it.

Can you give me a hand?

like image 864
TheChymera Avatar asked Mar 21 '23 13:03

TheChymera


1 Answers

Should be able just make data_all a list and concatenate once at the end:

data_all = []
for a in a_list:
    group = some.function(a, etc)
    group = group.set_index(['CoI'], append=True, drop=True)
    group = group.reorder_levels(['CoI','oldindex'])
    data_all.append(group)

data_all = pd.concat(data_all, ignore_index=False)

Also keep in mind that pandas' concat works with iterators. Something like yield group may be more efficient than appending to a list each time. I haven't profiled it though!

like image 186
TomAugspurger Avatar answered Apr 01 '23 00:04

TomAugspurger