Is there an easy way to group columns in a Pandas DataFrame?

Tags:

I am trying to use Pandas to represent motion-capture data, which has T measurements of the (x, y, z) locations of each of N markers. For example, with T=3 and N=4, the raw CSV data looks like:

T,Ax,Ay,Az,Bx,By,Bz,Cx,Cy,Cz,Dx,Dy,Dz
0,1,2,1,3,2,1,4,2,1,5,2,1
1,8,2,3,3,2,9,9,1,3,4,9,1
2,4,5,7,7,7,1,8,3,6,9,2,3

This is really simple to load into a DataFrame, and I've learned a few tricks that are easy (converting marker data to z-scores, or computing velocities, for example).

One thing I'd like to do, though, is convert the "flat" data shown above into a format that has a hierarchical index on the column (marker), so that there would be N columns at level 0 (one for each marker), and each one of those would have 3 columns at level 1 (one each for x, y, and z).

  A     B     C     D
  x y z x y z x y z x y z
0 1 2 1 3 2 1 4 2 1 5 2 1
1 8 2 3 3 2 9 9 1 3 4 9 1
2 4 5 7 7 7 1 8 3 6 9 2 3

I know how do this by loading up the flat file and then manipulating the Series objects directly, perhaps by using append or just creating a new DataFrame using a manually-created MultiIndex.

As a Pandas learner, it feels like there must be a way to do this with less effort, but it's hard to discover. Is there an easier way?

856

asked Jun 11 '15 21:06

lmjohns3

1 Answers

You basically just need to manipulate the column names, in your case.

Starting with your original DataFrame (and a tiny index manipulation):

from StringIO import StringIO
import numpy as np
a = pd.read_csv(StringIO('T,Ax,Ay,Az,Bx,By,Bz,Cx,Cy,Cz,Dx,Dy,Dz\n\
    0,1,2,1,3,2,1,4,2,1,5,2,1\n\
    1,8,2,3,3,2,9,9,1,3,4,9,1\n\
    2,4,5,7,7,7,1,8,3,6,9,2,3'))
a.set_index('T', inplace=True)

So that:

>> a
Ax  Ay  Az  Bx  By  Bz  Cx  Cy  Cz  Dx  Dy  Dz
T                                               
0   1   2   1   3   2   1   4   2   1   5   2   1
1   8   2   3   3   2   9   9   1   3   4   9   1
2   4   5   7   7   7   1   8   3   6   9   2   3

Then simply create a list of tuples for your columns, and use MultiIndex.from_tuples:

a.columns = pd.MultiIndex.from_tuples([(c[0], c[1]) for c in a.columns])

>> a
    A           B           C           D
    x   y   z   x   y   z   x   y   z   x   y   z
T                                               
0   1   2   1   3   2   1   4   2   1   5   2   1
1   8   2   3   3   2   9   9   1   3   4   9   1
2   4   5   7   7   7   1   8   3   6   9   2   3

154

answered Oct 14 '22 09:10

Ami Tavory

Related questions
                            
                                How to pandas df.assign() with variable names?
                            
                                Python, Pandas datareader and Yahoo Error RemoteDataError: Unable to read URL
                            
                                How to fill the missing record of Pandas dataframe in pythonic way?
                            
                                Set value multiindex Pandas
                            
                                Python: fastest way to write pandas DataFrame to Excel on multiple sheets
                            
                                sql select group by a having count(1) > 1 equivalent in python pandas?
                            
                                When to use DataFrame.eval() versus pandas.eval() or Python eval()
                            
                                How do I change the hex size in a seaborn jointplot? (the hexes themselves, not the bins)
                            
                                Permission denied when pandas dataframe to tempfile csv
                            
                                TypeError: ufunc 'isnan' not supported for the input types, - seaborn Heatmap
                            
                                How to sort bars in a bar plot in ascending order
                            
                                Reading data from CSV into dataframe with multiple delimiters efficiently
                            
                                How to drop duplicates but keep the rows if a particular other column is not null (Pandas)
                            
                                Resampling with custom periods
                            
                                Equivalent of "in" keyword or subquery in pandas
                            
                                How to apply hierarchy or multi-index to pandas columns
                            
                                Make a Pandas MultiIndex from a product of iterables?
                            
                                Fastest way to load numeric data into python/pandas/numpy array from MySQL
                            
                                Python Pandas Create New Bin/Bucket Variable with pd.qcut
                            
                                Pandas error - invalid value encountered

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is there an easy way to group columns in a Pandas DataFrame?

Tags:

pandas

dataframe

indices

columnname

lmjohns3

People also ask

1 Answers

Ami Tavory

Recent Activity

Donate For Us