I'm loading a csv file, which has the following columns: date, textA, textB, numberA, numberB I want to group by the columns: date, textA and textB - but want to apply "sum" to numberA, but "min" to numberB. <pre class="prettyprint"><code>data = pd.read_table("file.csv", sep=",", thousands=',') grouped = data.groupby(["date", "textA", "textB"], as_index=False) </code></pre> ...but I cannot see how to then apply two different aggregate functions, to two different columns? I.e. <code>sum(numberA), min(numberB)</code>

The <code>agg</code> method can accept a dict, in which case the keys indicate the column to which the function is applied: <pre class="prettyprint"><code>grouped.agg({'numberA':'sum', 'numberB':'min'}) </code></pre> <hr> For example, <pre class="prettyprint"><code>import numpy as np import pandas as pd df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'], 'B': ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], 'number A': np.arange(8), 'number B': np.arange(8) * 2}) grouped = df.groupby('A') print(grouped.agg({ 'number A': 'sum', 'number B': 'min'})) </code></pre> yields <pre class="prettyprint"><code> number B number A A bar 2 9 foo 0 19 </code></pre> This also shows that Pandas can handle spaces in column names. I'm not sure what the origin of the problem was, but literal spaces should not have posed a problem. If you wish to investigate this further, <pre class="prettyprint"><code>print(df.columns) </code></pre> without reassigning the column names, will show show us the <code>repr</code> of the names. Maybe there was a hard-to-see character in the column name that looked like a space (or some other character) but was actually a <code>u'\xa0'</code> (NO-BREAK SPACE), for example.

Pandas - possible to aggregate two columns using two different aggregations?

Tags:

pandas

aggregation

I'm loading a csv file, which has the following columns: date, textA, textB, numberA, numberB

I want to group by the columns: date, textA and textB - but want to apply "sum" to numberA, but "min" to numberB.

data = pd.read_table("file.csv", sep=",", thousands=',') grouped = data.groupby(["date", "textA", "textB"], as_index=False)

...but I cannot see how to then apply two different aggregate functions, to two different columns? I.e. sum(numberA), min(numberB)

396

asked Sep 16 '13 21:09

marcus adamski

1 Answers

The agg method can accept a dict, in which case the keys indicate the column to which the function is applied:

grouped.agg({'numberA':'sum', 'numberB':'min'})

For example,

import numpy as np import pandas as pd df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar',                          'foo', 'bar', 'foo', 'foo'],                    'B': ['one', 'one', 'two', 'three',                          'two', 'two', 'one', 'three'],                    'number A': np.arange(8),                    'number B': np.arange(8) * 2}) grouped = df.groupby('A')  print(grouped.agg({     'number A': 'sum',     'number B': 'min'}))

yields

     number B  number A A                       bar         2         9 foo         0        19

This also shows that Pandas can handle spaces in column names. I'm not sure what the origin of the problem was, but literal spaces should not have posed a problem. If you wish to investigate this further,

print(df.columns)

without reassigning the column names, will show show us the repr of the names. Maybe there was a hard-to-see character in the column name that looked like a space (or some other character) but was actually a u'\xa0' (NO-BREAK SPACE), for example.

answered Oct 04 '22 10:10

unutbu

Related questions
                            
                                How do I convert timestamp to datetime.date in pandas dataframe?
                            
                                How to write data to Redshift that is a result of a dataframe created in Python?
                            
                                How to concatenate multiple pandas.DataFrames without running into MemoryError
                            
                                Extract int from string in Pandas
                            
                                Converting statsmodels summary object to Pandas Dataframe
                            
                                Changing multiple column names but not all of them - Pandas Python
                            
                                Pandas Append Not Working
                            
                                TypeError: first argument must be an iterable of pandas objects, you passed an object of type "DataFrame"
                            
                                removing newlines from messy strings in pandas dataframe cells?
                            
                                What happens when you compare 2 pandas Series
                            
                                Installing pandas in docker Alpine
                            
                                Constructing 3D Pandas DataFrame
                            
                                How can I convert a two column array to a matrix with counts of occurences?
                            
                                When to use Category rather than Object?
                            
                                Creating a new column in Panda by using lambda function on two existing columns
                            
                                What is python's equivalent of R's NA?
                            
                                ValueError: DataFrame index must be unique for orient='columns'
                            
                                Number of rows changes even after `pandas.merge` with `left` option
                            
                                How do I store data from the Bloomberg API into a Pandas dataframe?
                            
                                Create a pandas DataFrame from multiple dicts [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With