In a pandas dataframe, a function can be used to group its index. I'm looking to define a function that instead is applied to a column. I'm looking to group by two columns, except I need the second column to be grouped by an arbitrary function, <code>foo</code>: <pre class="prettyprint"><code>group_sum = df.groupby(['name', foo])['tickets'].sum() </code></pre> How would <code>foo</code> be defined to group the second column into two groups, demarcated by whether values are <code>> 0</code>, for example? Or, is an entirely different approach or syntax used?

Groupby can accept any combination of both labels and series/arrays (as long as the array has the same length as your dataframe), so you can map the function to your column and pass it into the groupby, like <pre class="prettyprint"><code>df.groupby(['name', df[1].map(foo)]) </code></pre> Alternatively you might want to add the condition as a new column to your dataframe before your perform the groupby, this will have the advantage of giving it a name in the index: <pre class="prettyprint"><code>df['>0'] = df[1] > 0 group_sum = df.groupby(['name', '>0'])['tickets'].sum() </code></pre>

Something like this will work: <pre class="prettyprint"><code>x.groupby(['name', x['value']>0])['tickets'].sum() </code></pre> Like mentioned above the <code>groupby</code> can accept labels and series. This should give you the answer you are looking for. Here is an example: <pre class="prettyprint"><code>data = np.array([[1, -1, 20], [1, 1, 50], [1, 1, 50], [2, 0, 100]]) x = pd.DataFrame(data, columns = ['name', 'value', 'value2']) x.groupby(['name', x['value']>0])['value2'].sum() name value 1 False 20 True 100 2 False 100 Name: value2, dtype: int64 </code></pre>

Apply function to 2nd column in pandas dataframe groupby

Tags:

python

pandas

dataframe

In a pandas dataframe, a function can be used to group its index. I'm looking to define a function that instead is applied to a column.

I'm looking to group by two columns, except I need the second column to be grouped by an arbitrary function, foo:

group_sum = df.groupby(['name', foo])['tickets'].sum()

How would foo be defined to group the second column into two groups, demarcated by whether values are > 0, for example? Or, is an entirely different approach or syntax used?

533

asked Oct 25 '16 23:10

Brian Bien

Video Answer

2 Answers

Groupby can accept any combination of both labels and series/arrays (as long as the array has the same length as your dataframe), so you can map the function to your column and pass it into the groupby, like

df.groupby(['name', df[1].map(foo)])

Alternatively you might want to add the condition as a new column to your dataframe before your perform the groupby, this will have the advantage of giving it a name in the index:

df['>0'] = df[1] > 0
group_sum = df.groupby(['name', '>0'])['tickets'].sum()

145

answered Oct 17 '22 01:10

maxymoo

Something like this will work:

x.groupby(['name', x['value']>0])['tickets'].sum()

Like mentioned above the groupby can accept labels and series. This should give you the answer you are looking for. Here is an example:

data = np.array([[1, -1, 20], [1, 1, 50], [1, 1, 50], [2, 0, 100]])
x = pd.DataFrame(data, columns = ['name', 'value', 'value2'])
x.groupby(['name', x['value']>0])['value2'].sum()

name  value
1     False     20
      True     100
2     False    100
Name: value2, dtype: int64

answered Oct 17 '22 01:10

RDizzl3

Related questions
                            
                                How to set tight_layout for matplotlib graphs after show()
                            
                                Is it safe to do a data migration as just one operation in a larger Django migration?
                            
                                How to get scraped items from main script using scrapy?
                            
                                Why is relative path not working in python tests?
                            
                                Python 3.5 TypeError: got multiple values for argument [duplicate]
                            
                                Sliding window iterator using rolling in pandas
                            
                                Why does create() in PayPal's batch payments via API return False?
                            
                                Reading hex to double-precision float python
                            
                                How indexing works in Pandas?
                            
                                Making a PyInstaller exe do both command-line and windowed
                            
                                WARNING (theano.sandbox.cuda): CUDA is installed, but device gpu is not available (error: cuda unavailable)
                            
                                Python how to get the calling function (not just its name)?
                            
                                Flower doesn't display all workers for celery
                            
                                pandas: all NaNs when subtracting two dataframes
                            
                                python create html table from dict
                            
                                The Pythonic way to grow a list of lists
                            
                                Benchmark of HowTo: Reading Data
                            
                                Django Form request.POST.get() always returns empty
                            
                                Jinja2: render template inside template
                            
                                Keras: reshape to connect lstm and conv

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With