Preform aggregation(s) on multiindex columns

Tags:

I'm starting with this dataframe:

df = pd.DataFrame(
    [
        ["a", "aa", "2020-12-20", 10],
        ["a", "ab", "2020-12-26", 11],
        ["a", "aa", "2020-12-22", 10],
        ["b", "bb", "2020-12-25", 111],
        ["c", "bb", "2020-12-20", 20],
        ["d", "dd", "2020-12-05", 1111]
    ],
    columns=["cat", "user", "date", "value"]
)
df["date"] = pd.to_datetime(df.date)

	cat	user	date	value
0	a	aa	2020-12-20 00:00:00	10
1	a	ab	2020-12-26 00:00:00	11
2	a	aa	2020-12-22 00:00:00	10
3	b	bb	2020-12-25 00:00:00	111
4	c	bb	2020-12-20 00:00:00	20
5	d	dd	2020-12-05 00:00:00	1111

Next, I'm running the following aggregation:

gb = (
    df.set_index("date")
    .groupby("cat")
    .resample("W")
    .agg(
        {"value": "sum", "user": ["nunique", lambda x: x.unique()]}
    )
    .rename({"<lambda>": "unqiue_users"}, axis=1)
)

This yields a table with multiindex in the columns:

               value    user             
                 sum nunique unqiue_users
cat date                                 
a   2020-12-20    10       1           aa
    2020-12-27    21       2     [aa, ab]
b   2020-12-27   111       1           bb
c   2020-12-20    20       1           bb
d   2020-12-06  1111       1           dd

Lastly, I'm trying to run aggregations on the last result like:

gb.groupby(level=0)[["value", "sum"]].mean()

I don't know how to "access" the columns that have multiindex. Any idea?

868

asked Jan 12 '21 07:01

Dror

1 Answers

For select MultiIndex and used tuples, here is used one element list:

print (gb.groupby(level=0)[[("value", "sum")]].mean())
      value
        sum
cat        
a      15.5
b     111.0
c      20.0
d    1111.0

Or you can use simplify solution with mean per level:

print (gb[[("value", "sum")]].mean(level=0))
      value
        sum
cat        
a      15.5
b     111.0
c      20.0
d    1111.0

For Series select omit nested list:

print (gb[("value", "sum")].mean(level=0))
cat
a      15.5
b     111.0
c      20.0
d    1111.0
Name: (value, sum), dtype: float64

Your solution should be changed for avoid MultiIndex in columns:

gb = (
    df.set_index("date")
    .groupby(["cat", pd.Grouper(freq='W')])
    .agg(val = ("value",  "sum"),
         nuniq = ("user", "nunique"),
         unqiue_users = ("user", lambda x: x.unique()))
    )
    
print (gb)
                 val  nuniq unqiue_users
cat date                                
a   2020-12-20    10      1           aa
    2020-12-27    21      2     [ab, aa]
b   2020-12-27   111      1           bb
c   2020-12-20    20      1           bb
d   2020-12-06  1111      1           dd


print (gb['val'].mean(level=0))
cat
a      15.5
b     111.0
c      20.0
d    1111.0
Name: val, dtype: float64

116

answered Sep 23 '22 19:09

jezrael

Related questions
                            
                                python import path for sub modules if put in namespace package
                            
                                Python / Pyspark - Correct method chaining order rules
                            
                                Pandas melt multiple columns to tabulate a dataset
                            
                                Speed up random weighted choice without replacement in python
                            
                                Seaborn title error - AttributeError: 'FacetGrid' object has no attribute 'set_title
                            
                                How to disable scientific notation in hvPlot plots?
                            
                                How to speed up the performance of array masking from the results of numpy.searchsorted in python?
                            
                                TF-IDF vectorizer to extract ngrams
                            
                                Exclude a function from coverage
                            
                                List comprehension loop ordering depends on nesting [closed]
                            
                                After upgrade, raw sql queries return json fields as strings on postgres
                            
                                Modify all elements in a python list and change the type from string to integer
                            
                                How do I avoid type errors when internal function returns 'Union' that could be 'None'?
                            
                                Groupby and aggregate using lambda functions
                            
                                Can't get rid of unwanted stuff while scraping email addresses
                            
                                Comparison of np.random.choice vs np.random.shuffle for samples without replacement
                            
                                How does max_length, padding and truncation arguments work in HuggingFace' BertTokenizerFast.from_pretrained('bert-base-uncased') work??
                            
                                How can I check if a Python collection is ordered?
                            
                                How to config 'Completer.use_jedi' to 'False' in Juypter Notebook permanently
                            
                                How to Deal with Lat/Lon Arrays with Multiple Dimensions?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Preform aggregation(s) on multiindex columns

Tags:

python

pandas

dataframe

Dror

People also ask

1 Answers

jezrael

Recent Activity

Donate For Us