Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: For every row in df calculate number of times that value exist in another column

Tags:

python

pandas

I have following data frame.

>>> df = pd.DataFrame({'selected': ['A', 'B', 'C', 'A', 'B', 'C', 'A', 'D'], 'presented': ['A|B|D', 'B|D|A', 'A|B|C', 'D|C|B|A','A|C|D|B', 'D|B|C','D|C|B|A','D|B|C']})
>>> df

This is a large data set and have 500K rows (date column taken out to keep example simple)

  selected presented
0        A     A|B|D
1        B     B|D|A
2        C     A|B|C
3        A   D|C|B|A
4        B   A|C|D|B
5        C     D|B|C
6        A   D|C|B|A
7        D     D|B|C

Goal is to calculate selected/presented ratio for each item in the selected column. Example A was presented in 8 times but it was only selected 6 times out of those 8 times it was presented to the user.

I would like to create following resulting data.frame:

item, selected, presented, ratio
A, 3, 6, 0.5
B, 2, 8, 0.25

I started with following but can't figure out the grouping because if I just group by selected and start counting it would only capture the time it was shown.

>>> df['ratio'] = df.apply(lambda x:1 if x.selected in x.presented.split('|') else 0, axis=1)
>>> df
  selected presented  ratio
0        A     A|B|D      1
1        B     B|D|A      1
2        C     A|B|C      1
3        A   D|C|B|A      1
4        B   A|C|D|B      1
5        C     D|B|C      1
6        A   D|C|B|A      1
7        D     D|B|C      1
like image 387
add-semi-colons Avatar asked Dec 12 '25 01:12

add-semi-colons


2 Answers

You can using get_dummies + value_counts, then concat the result

s1=df.presented.str.get_dummies('|').sum().to_frame('presented')
s2=df.selected.value_counts()
yourdf=pd.concat([s1,s2],1,sort=True)
yourdf['ratio']=yourdf['selected']/yourdf['presented']

yourdf
Out[488]: 
   presented  selected     ratio
A          6         3  0.500000
B          8         2  0.250000
C          6         2  0.333333
D          7         1  0.142857
like image 109
BENY Avatar answered Dec 14 '25 15:12

BENY


How about this one-liner:

df['presented'].str.split('|', expand=True).stack().value_counts(sort=False).to_frame('presented')\
               .assign(selected = df['selected'].value_counts())\
               .eval('ratio = selected / presented')

Output:

   presented  selected     ratio
A          6         3  0.500000
C          6         2  0.333333
B          8         2  0.250000
D          7         1  0.142857
like image 42
Scott Boston Avatar answered Dec 14 '25 13:12

Scott Boston



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!