Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use Pandas to get the count of every combination inclusive

Tags:

python

pandas

I am trying to figure out what combination of clothing customers are buying together. I can figure out the exact combination, but the problem I can't figure out is the count that includes the combination + others.

For example, I have:

Cust_num  Item    Rev
Cust1     Shirt1  $40
Cust1     Shirt2  $40
Cust1     Shorts1 $40
Cust2     Shirt1  $40
Cust2     Shorts1 $40

This should result in:

Combo                  Count
Shirt1,Shirt2,Shorts1    1
Shirt1,Shorts1           2

The best I can do is unique combinations:

Combo                 Count
Shirt1,Shirt2,Shorts1   1
Shirt1,Shorts1          1

I tried:

df = df.pivot(index='Cust_num',columns='Item').sum()
df[df.notnull()] = "x"
df = df.loc[:,"Shirt1":].replace("x", pd.Series(df.columns, df.columns))
col = df.stack().groupby(level=0).apply(','.join)
df2 = pd.DataFrame(col)
df2.groupby([0]).size().reset_index(name='counts')

But that is just the unique counts.

like image 986
lys_dad Avatar asked Apr 08 '19 03:04

lys_dad


3 Answers

Using pandas.DataFrame.groupby:

grouped_item = df.groupby('Cust_num')['Item']
subsets = grouped_item.apply(lambda x: set(x)).tolist()
Count = [sum(s2.issubset(s1) for s1 in subsets) for s2 in subsets]
combo = grouped_item.apply(lambda x:','.join(x))
combo = combo.reset_index()
combo['Count']=Count

Output:

  Cust_num                   Item  Count
0    Cust1  Shirt1,Shirt2,Shorts1      1
1    Cust2         Shirt1,Shorts1      2
like image 188
Chris Avatar answered Oct 03 '22 00:10

Chris


Late answer, but you can use:

df = df.groupby(['Cust_num'], as_index=False).agg(','.join).drop(columns=['Rev']).set_index(['Item']).rename_axis("combo").rename(columns={"Cust_num": "Count"})
df['Count'] = df['Count'].str.replace(r'Cust','')

combo                   Count                 
Shirt1,Shirt2,Shorts1     1
Shirt1,Shorts1            2
like image 28
Pedro Lobito Avatar answered Oct 02 '22 22:10

Pedro Lobito


I think you need to create a combination of items first.

How to get all possible combinations of a list’s elements?

I used the function from Dan H's answer.

from itertools import chain, combinations
def all_subsets(ss):
    return chain(*map(lambda x: combinations(ss, x), range(0, len(ss)+1)))

uq_items = df.Item.unique()

list(all_subsets(uq_items))

[(),
 ('Shirt1',),
 ('Shirt2',),
 ('Shorts1',),
 ('Shirt1', 'Shirt2'),
 ('Shirt1', 'Shorts1'),
 ('Shirt2', 'Shorts1'),
 ('Shirt1', 'Shirt2', 'Shorts1')]

And use groupby each customer to get their items combination.

ls = []

for _, d in df.groupby('Cust_num', group_keys=False):
    # Get all possible subset of items
    pi = np.array(list(all_subsets(d.Item)))

    # Fliter only > 1
    ls.append(pi[[len(l) > 1 for l in pi]])

Then convert to Series and use value_counts().

pd.Series(np.concatenate(ls)).value_counts()

(Shirt1, Shorts1)            2
(Shirt2, Shorts1)            1
(Shirt1, Shirt2, Shorts1)    1
(Shirt1, Shirt2)             1
like image 39
ResidentSleeper Avatar answered Oct 02 '22 22:10

ResidentSleeper