Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does python have a similar function to cube() function in sql?

Tags:

python

I am new to python. Currently, I have a dataset that contains more than 40 columns needed to be grouped and aggregated. I was wondering if python has a function similar to cube() function in SQL. If not, how can I have the expected output? Really appreciate it if someone could answer for me. Thanks!

Below Example that I simplified to 2 columns only (Country & Attribute A):

| CustomerID | Country |Attribute A|Sales| No.of product| No. of transaction|
| ---------- | --------|-----------|-----|--------------|-------------------|
| 1          | US      |A          |20   |2             |2                  |
| 2          | US      |B          |25   |3             |3                  |
|3           |CA       |A          |100  |10            |10                 |
|4           |CA       |B          |50   |5             |5                  |
|5           |UK       |A          |40   |4             |4                  |

Expected Output:

| Country|Attribute A|Sum of Sales|Total no. of product| Total no. of transaction| Total no. of customer|
|--------|-----------|------------|--------------------|-------------------------|----------------------|
|US      |(null)     |45          |5                   |5                        |2                     |
|CA      |(null)     |150         |15                  |155                      |2                     |
|UK      |(null)     |40          |4                   |4                        |1                     |
|(null)  |A          |160         |16                  |16                       |3                     |
|(null)  |B          |75          |8                   |8                        |2                     |
|US      |A          |20          |2                   |2                        |1                     |
|US      |B          |25          |3                   |3                        |1                     |
|CA      |A          |100         |10                  |10                       |1                     |
|CA      |B          |50          |5                   |5                        |1                     |
|UK      |A          |40          |4                   |4                        |1                     |
like image 872
aaah_ooooh9305 Avatar asked Jun 09 '26 15:06

aaah_ooooh9305


1 Answers

Here's an extension of the other two answers that generalizes to arbitrarily many columns.

import pandas as pd
from itertools import combinations
    
def cube_sum(df,cols):
    dfs = []
    for n in range(len(cols),0,-1):
        for subset in combinations(cols,n):
            dfs.append(df.groupby(list(subset)).sum().reset_index())
    dfs.append(df.drop(cols, axis = 1).sum().to_frame().T)
    return pd.concat(dfs)


df = pd.DataFrame(
    {'CustomerID':[1,2,3,4,5],
    'Country':['US','US','CA','CA','UK'],
    'Attribute A':list("ABABA"),
    'Sales':[20,50,100,50,40],
    'No. of Product':[2,3,10,5,4],
    'No. of Transaction':[2,3,10,5,4]}).set_index('CustomerID')
all_groups = cube_sum(df,["Attribute A","Country"])

The result:

  Attribute A Country  Sales  No. of Product  No. of Transaction
0           A      CA    100              10                  10
1           A      UK     40               4                   4
2           A      US     20               2                   2
3           B      CA     50               5                   5
4           B      US     50               3                   3
0           A     NaN    160              16                  16
1           B     NaN    100               8                   8
0         NaN      CA    150              15                  15
1         NaN      UK     40               4                   4
2         NaN      US     70               5                   5
0         NaN     NaN    260              24                  24

In response to the comment:

import pandas as pd 
from itertools import combinations 

def cube_agg(df,cols):     
    dfs = []     
    for n in range(len(cols),0,-1):         
        for subset in combinations(cols,n):             
            dfs.append(df.groupby(list(subset))
                 .aggregate({'Population': 'mean', 'Area': ['min', 'max']})
                 .reset_index())
    dfs.append((df.drop(["Attribute A","Country"], axis = 1)
                 .aggregate({'Population': 'mean', 'Area': ['min', 'max']})
                 .T.stack()
                 .to_frame().T))
    return pd.concat(dfs)

df = pd.DataFrame(
    {'CustomerID':[1,2,3,4,5],
    'Country':['US','US','CA','CA','UK'],
    'Attribute A':list("ABABA"),
    'Population':[20,50,100,50,40],
    'Area':[2,3,10,5,4]}).set_index('CustomerID')
all_groups = cube_agg(df,["Attribute A","Country"])

Result:

   Area       Attribute A Country  Population
    max   min                            mean
0  10.0  10.0           A      CA  100.000000
1   4.0   4.0           A      UK   40.000000
2   2.0   2.0           A      US   20.000000
3   5.0   5.0           B      CA   50.000000
4   3.0   3.0           B      US   50.000000
0  10.0   2.0           A     NaN   53.333333
1   5.0   3.0           B     NaN   50.000000
0  10.0   5.0         NaN      CA   75.000000
1   4.0   4.0         NaN      UK   40.000000
2   3.0   2.0         NaN      US   35.000000
0  10.0   2.0         NaN     NaN   52.000000

The trickiest aspect of this is getting the bottom row (overall aggregate) to work as expected. Note that the columns seem to be arranged in alphabetical order, hence the area coming first.

like image 66
Ben Grossmann Avatar answered Jun 12 '26 04:06

Ben Grossmann



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!