How can I iterate and apply a function over a single level of a DataFrame with MultiIndex?

Tags:

Thanks to the response to my initial question, I now have a multi-indexed DataFrame the way that I want it. Now that I have the data in the data structure, I'm trying to get it out and wonder if there is a better way to do this. My two problems are related, but may have separate "ideal" solutions:

Sample DataFrame (truncated)

Experiment           IWWGCW         IWWGDW       
Lead Time                24     48      24     48
2010-11-27 12:00:00   0.997  0.991   0.998  0.990
2010-11-28 12:00:00   0.998  0.987   0.997  0.990
2010-11-29 12:00:00   0.997  0.992   0.997  0.992
2010-11-30 12:00:00   0.997  0.987   0.997  0.987
2010-12-01 12:00:00   0.996  0.986   0.996  0.986

Iteration

I'd like to be able to loop over this DataFrame where the iteration would take me down only 1 index dimension, i.e. an iteritems behavior that would return [('IWWGCW', df['IWWGCW']), ('IWWGDW', df['IWWGDW'])] and yield 2 DataFrames with Lead Time columns. My brute-force solution is to use a wrapper routine that basically does [(key, df[key] for key in df.columns.levels[0]]. Is there a better way to do this?

Apply

I'd also like to do things like "subtract the IWWGDW entries from everybody else" to compute paired differences. I tried to do df.apply(lambda f: f - df['IWWGDW']) but get a KeyError: ('IWWGDW', 'occurred at index 2010-11-26 12:00:00') regardless of if I use axis=1 or axis=0. I've tried rebuilding a new DataFrame using the iteration workaround identified above, but I always worry when I brute-force things. Is there a more "pandasic" way to do this sort of computation?

445

asked Jun 15 '12 19:06

Tim Whitcomb

1 Answers

I would suggest using groupby for iteration:

In [25]: for exp, group in df.groupby(level=0, axis=1):
   ....:     print exp, group
   ....:     
IWWGCW Experiment           IWWGCW       
Lead Time                24     48
2010-11-27 12:00:00   0.997  0.991
2010-11-28 12:00:00   0.998  0.987
2010-11-29 12:00:00   0.997  0.992
2010-11-30 12:00:00   0.997  0.987
2010-12-01 12:00:00   0.996  0.986
IWWGDW Experiment           IWWGDW       
Lead Time                24     48
2010-11-27 12:00:00   0.998  0.990
2010-11-28 12:00:00   0.997  0.990
2010-11-29 12:00:00   0.997  0.992
2010-11-30 12:00:00   0.997  0.987
2010-12-01 12:00:00   0.996  0.986

However, I see that this doesn't drop the top level as you're looking for. Ideally you would be able to write something like:

df.groupby(level=0, axis=1).sub(df['IWWGCW'])

and have that do the pair-wise subtraction, but because df['IWWGCW'] drops the level, the column names don't line up. This works, though:

In [29]: df.groupby(level=0, axis=1).sub(df['IWWGCW'].values)
Out[29]: 
Experiment           IWWGCW      IWWGDW       
Lead Time                24  48      24     48
2010-11-27 12:00:00       0   0   0.001 -0.001
2010-11-28 12:00:00       0   0  -0.001  0.003
2010-11-29 12:00:00       0   0   0.000  0.000
2010-11-30 12:00:00       0   0   0.000  0.000
2010-12-01 12:00:00       0   0   0.000  0.000

I'll think a bit more about this.

answered Sep 18 '22 22:09

Wes McKinney

Related questions
                            
                                pylint complains on py.test: "Module 'pytest' has no 'raises' member"
                            
                                Mongo: avoid duplicate files in gridfs
                            
                                Use entry combobox when migrating to pygobject, glade and gtk3
                            
                                pyinstaller with a non-trivial directory structor
                            
                                Why does the pygame window not close properly?
                            
                                Plot a cube of 3D intensity data
                            
                                In Python, when is platlib != purelib?
                            
                                Deployment of application with embedded Python 3
                            
                                Repeating Tile Background pygame
                            
                                django set_test_cookie fails to set cookie
                            
                                Recursively unzip archives, store (filename, extracted-contents) in dictionary
                            
                                cursort.execute/cursor.callproc returns no error but nothing is executed
                            
                                Python flask app validate text field - allow no space
                            
                                Python subprocess.call works but subprocess.check_call does not - what are the differences?
                            
                                Prevent Sphinx from executing the module
                            
                                ^[[A and ^[[B show up in python interpreter when I press arrow keys
                            
                                re.RegexObject doesn't exist (raises AttributeError)
                            
                                What better way to concatenate string in python?
                            
                                Double-indexed dictionary
                            
                                Regex Python adding characters after a certain word

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I iterate and apply a function over a single level of a DataFrame with MultiIndex?

Tags:

python

pandas

Tim Whitcomb

People also ask

1 Answers

Wes McKinney

Recent Activity

Donate For Us