Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there an easy way to group columns in a Pandas DataFrame?

I am trying to use Pandas to represent motion-capture data, which has T measurements of the (x, y, z) locations of each of N markers. For example, with T=3 and N=4, the raw CSV data looks like:

T,Ax,Ay,Az,Bx,By,Bz,Cx,Cy,Cz,Dx,Dy,Dz
0,1,2,1,3,2,1,4,2,1,5,2,1
1,8,2,3,3,2,9,9,1,3,4,9,1
2,4,5,7,7,7,1,8,3,6,9,2,3

This is really simple to load into a DataFrame, and I've learned a few tricks that are easy (converting marker data to z-scores, or computing velocities, for example).

One thing I'd like to do, though, is convert the "flat" data shown above into a format that has a hierarchical index on the column (marker), so that there would be N columns at level 0 (one for each marker), and each one of those would have 3 columns at level 1 (one each for x, y, and z).

  A     B     C     D
  x y z x y z x y z x y z
0 1 2 1 3 2 1 4 2 1 5 2 1
1 8 2 3 3 2 9 9 1 3 4 9 1
2 4 5 7 7 7 1 8 3 6 9 2 3

I know how do this by loading up the flat file and then manipulating the Series objects directly, perhaps by using append or just creating a new DataFrame using a manually-created MultiIndex.

As a Pandas learner, it feels like there must be a way to do this with less effort, but it's hard to discover. Is there an easier way?

like image 856
lmjohns3 Avatar asked Jun 11 '15 21:06

lmjohns3


People also ask

Can you group by multiple columns in pandas?

Grouping by Multiple ColumnsYou can do this by passing a list of column names to groupby instead of a single string value.

How do I group data in pandas Python?

Explanation: Pandas agg() function can be used to handle this type of computing tasks. Relevant columns and the involved aggregate operations are passed into the function in the form of dictionary, where the columns are keys and the aggregates are values, to get the aggregation done.


1 Answers

You basically just need to manipulate the column names, in your case.

Starting with your original DataFrame (and a tiny index manipulation):

from StringIO import StringIO
import numpy as np
a = pd.read_csv(StringIO('T,Ax,Ay,Az,Bx,By,Bz,Cx,Cy,Cz,Dx,Dy,Dz\n\
    0,1,2,1,3,2,1,4,2,1,5,2,1\n\
    1,8,2,3,3,2,9,9,1,3,4,9,1\n\
    2,4,5,7,7,7,1,8,3,6,9,2,3'))
a.set_index('T', inplace=True)

So that:

>> a
Ax  Ay  Az  Bx  By  Bz  Cx  Cy  Cz  Dx  Dy  Dz
T                                               
0   1   2   1   3   2   1   4   2   1   5   2   1
1   8   2   3   3   2   9   9   1   3   4   9   1
2   4   5   7   7   7   1   8   3   6   9   2   3

Then simply create a list of tuples for your columns, and use MultiIndex.from_tuples:

a.columns = pd.MultiIndex.from_tuples([(c[0], c[1]) for c in a.columns])

>> a
    A           B           C           D
    x   y   z   x   y   z   x   y   z   x   y   z
T                                               
0   1   2   1   3   2   1   4   2   1   5   2   1
1   8   2   3   3   2   9   9   1   3   4   9   1
2   4   5   7   7   7   1   8   3   6   9   2   3
like image 154
Ami Tavory Avatar answered Oct 14 '22 09:10

Ami Tavory