I have a dataset that contains the feeding data of 3 animals, consisting of the animals' tag ids (1,2,3), the type (A,B) and amount (kg) of feed given at each 'meal': <pre class="prettyprint"><code>Animal FeedType Amount(kg) Animal1 A 10 Animal2 B 7 Animal3 A 4 Animal2 A 2 Animal1 B 5 Animal2 B 6 Animal3 A 2 </code></pre> In base R, I can easily output the matrix below which has <code>unique('Animal')</code> as its rows, <code>unique('FeedType')</code> as its columns and the cumulative <code>Amount (kg)</code> in the corresponding cells of the matrix by using <code>tapply()</code> as below <pre class="prettyprint"><code>out <- with(mydf, tapply(Amount, list(Animal, FeedType), sum)) A B Animal1 10 5 Animal2 2 13 Animal3 6 NA </code></pre> Is there an equivalent functionality for a Python Pandas dataframe? What is the most elegant and fastest way to achieve this in Pandas? P.S. I want to be able to specify on what column, in this case <code>Amount</code>, to perform the aggregation. Thanks in advance. EDIT: I tried both approaches in the two answers. Performance results with my actual Pandas data-frame of 216,347 rows and 15 columns: <pre class="prettyprint"><code>start_time1 = timeit.default_timer() mydf.groupby(['Animal','FeedType'])['Amount'].sum() elapsed_groupby = timeit.default_timer() - start_time1 start_time2 = timeit.default_timer() mydf.pivot_table(rows='Animal', cols='FeedType',values='Amount',aggfunc='sum') elapsed_pivot = timeit.default_timer() - start_time2 print ('elapsed_groupby: ' + str(elapsed_groupby)) print ('elapsed_pivot: ' + str(elapsed_pivot)) </code></pre> gives: <pre class="prettyprint"><code>elapsed_groupby: 10.172213 elapsed_pivot: 8.465783 </code></pre> So in my case, pivot_table() works faster.

First I read in your data: <pre class="prettyprint"><code>In [7]: df = pd.read_clipboard(sep="\s+", index_col=False) In [8]: df Out[8]: Animal FeedType Amount(kg) 0 Animal1 A 10 1 Animal2 B 7 2 Animal3 A 4 3 Animal2 A 2 4 Animal1 B 5 5 Animal2 B 6 6 Animal3 A 2 </code></pre> Then I can groupby the two columns to aggregate: <pre class="prettyprint"><code>In [9]: df.groupby(['Animal','FeedType']).sum() Out[9]: Amount(kg) Animal FeedType Animal1 A 10 B 5 Animal2 A 2 B 13 Animal3 A 6 </code></pre> To get it in the same format, I can <code>unstack</code> the <code>dataframe</code>: <pre class="prettyprint"><code>In [10]: df.groupby(['Animal','FeedType']).sum().unstack() Out[10]: Amount(kg) FeedType A B Animal Animal1 10 5 Animal2 2 13 Animal3 6 NaN </code></pre>

Equivalent of R's tapply() in Python Pandas

Tags:

python

pandas

r

tapply

I have a dataset that contains the feeding data of 3 animals, consisting of the animals' tag ids (1,2,3), the type (A,B) and amount (kg) of feed given at each 'meal':

Animal   FeedType   Amount(kg)
Animal1     A         10
Animal2     B         7
Animal3     A         4
Animal2     A         2
Animal1     B         5
Animal2     B         6
Animal3     A         2

In base R, I can easily output the matrix below which has unique('Animal') as its rows, unique('FeedType') as its columns and the cumulative Amount (kg) in the corresponding cells of the matrix by using tapply() as below

out <- with(mydf, tapply(Amount, list(Animal, FeedType), sum))

         A  B
Animal1 10  5
Animal2  2 13
Animal3  6 NA

Is there an equivalent functionality for a Python Pandas dataframe? What is the most elegant and fastest way to achieve this in Pandas?

P.S. I want to be able to specify on what column, in this case Amount, to perform the aggregation.

Thanks in advance.

EDIT:

I tried both approaches in the two answers. Performance results with my actual Pandas data-frame of 216,347 rows and 15 columns:

start_time1 = timeit.default_timer()
mydf.groupby(['Animal','FeedType'])['Amount'].sum()
elapsed_groupby = timeit.default_timer() - start_time1

start_time2 = timeit.default_timer()
mydf.pivot_table(rows='Animal', cols='FeedType',values='Amount',aggfunc='sum')
elapsed_pivot = timeit.default_timer() - start_time2

print ('elapsed_groupby: ' + str(elapsed_groupby))
print ('elapsed_pivot: ' + str(elapsed_pivot))

gives:

elapsed_groupby: 10.172213
elapsed_pivot: 8.465783

So in my case, pivot_table() works faster.

862

asked Jan 03 '14 14:01

Zhubarb

2 Answers

First I read in your data:

In [7]: df = pd.read_clipboard(sep="\s+", index_col=False)

In [8]: df
Out[8]:
    Animal FeedType  Amount(kg)
0  Animal1        A          10
1  Animal2        B           7
2  Animal3        A           4
3  Animal2        A           2
4  Animal1        B           5
5  Animal2        B           6
6  Animal3        A           2

Then I can groupby the two columns to aggregate:

In [9]: df.groupby(['Animal','FeedType']).sum()
Out[9]:
                  Amount(kg)
Animal  FeedType
Animal1 A                 10
        B                  5
Animal2 A                  2
        B                 13
Animal3 A                  6

To get it in the same format, I can unstack the dataframe:

In [10]: df.groupby(['Animal','FeedType']).sum().unstack()
Out[10]:
          Amount(kg)
FeedType           A   B
Animal
Animal1           10   5
Animal2            2  13
Animal3            6 NaN

102

answered Oct 12 '22 16:10

Zelazny7

The approach of @Zelazny7 with groupby and unstack is certainly fine, but for completeness, you can also do this directly with pivot_table (see doc) [version 0.13 and below]:

In [13]: df.pivot_table(rows='Animal', cols='FeedType', values='Amount(kg)', aggfunc='sum')
Out[13]:
FeedType   A   B
Animal
Animal1   10   5
Animal2    2  13
Animal3    6 NaN

In newer versions of Pandas (version 0.14 and latter), arguments of pivot_table have been changed:

In [13]: df.pivot_table(index='Animal', columns='FeedType', values='Amount(kg)', aggfunc='sum')
Out[13]:
FeedType   A   B
Animal
Animal1   10   5
Animal2    2  13
Animal3    6 NaN

answered Oct 12 '22 18:10

joris

Related questions
                            
                                Python 2.7 - find and replace from text file, using dictionary, to new text file
                            
                                Match regex in any order
                            
                                Floating Point Arithmetic error
                            
                                Selenium - Unresponsive Script Error (Firefox)
                            
                                OpenCV wont' capture from MacBook Pro iSight
                            
                                Vectorizing a function (Python)
                            
                                Get an audio sample as float number from pyaudio-stream
                            
                                Callback function tkinter button with variable parameter
                            
                                SQLite3 serial type wasn't incremented
                            
                                Regex for removing data in parenthesis
                            
                                emacs Flycheck "Configured syntax checker python-flake8 cannot be used"
                            
                                Flask-RESTful: Using GET to download a file with REST
                            
                                Python PyQt QFileSystemModel Root Path
                            
                                What happens when objects in a Set are altered to match each other?
                            
                                Combining numpy multi-dimensional arrays
                            
                                How to pop up an interactive matplotlib figure in IPython?
                            
                                python selenium import my regular firefox profile ( add-ons)
                            
                                How do I get the time execution for each iteration? Python [duplicate]
                            
                                Error installing numpy
                            
                                Why can't LinearSVC do this simple classification?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With