I'm loading a csv file, which has the following columns: date, textA, textB, numberA, numberB
I want to group by the columns: date, textA and textB - but want to apply "sum" to numberA, but "min" to numberB.
data = pd.read_table("file.csv", sep=",", thousands=',') grouped = data.groupby(["date", "textA", "textB"], as_index=False)
...but I cannot see how to then apply two different aggregate functions, to two different columns? I.e. sum(numberA), min(numberB)
Pandas comes with a whole host of sql-like aggregation functions you can apply when grouping on one or more columns. This is Python's closest equivalent to dplyr's group_by + summarise logic.
Yes, it is possible to use MySQL GROUP BY clause with multiple columns just as we can use MySQL DISTINCT clause.
To apply more than one aggregation when using pandas GroupBy, you simply pass in a dictionary to the . agg function. In your dictionary, your key will be the column name and the value will be a list of operations you want to perform on the column. The result will be a DataFrame with a MultiIndex column.
The agg
method can accept a dict, in which case the keys indicate the column to which the function is applied:
grouped.agg({'numberA':'sum', 'numberB':'min'})
For example,
import numpy as np import pandas as pd df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'], 'B': ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], 'number A': np.arange(8), 'number B': np.arange(8) * 2}) grouped = df.groupby('A') print(grouped.agg({ 'number A': 'sum', 'number B': 'min'}))
yields
number B number A A bar 2 9 foo 0 19
This also shows that Pandas can handle spaces in column names. I'm not sure what the origin of the problem was, but literal spaces should not have posed a problem. If you wish to investigate this further,
print(df.columns)
without reassigning the column names, will show show us the repr
of the names. Maybe there was a hard-to-see character in the column name that looked like a space (or some other character) but was actually a u'\xa0'
(NO-BREAK SPACE), for example.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With