Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Explode column of strings and count character frequencies

I have a dataset with 2 columns that look like:

|group| |sequence|
A        BX
A        X
B        SFS
B        BCX
B        BSS*B1S
A        BBX

I'd like some way to be able to group and find the frequency of each character, to get something like this:

 |group| |char| |freq|
 A       B       3
 A       X       3
 B       S       5
 ...
like image 315
Justin Avatar asked Jan 02 '23 13:01

Justin


1 Answers

You could use an efficient repeat-based solution followed by groupby:

from itertools import chain

# Step 1 - flatten your dataframe
df = pd.DataFrame({
    'group' : df['group'].repeat(df.sequence.str.len()), 
    'char' : list(chain.from_iterable(df.sequence.tolist()))
})
# Step 2 - filter out characters and groupby on `group`
df[df.char.str.isalpha()].groupby(['group', 'char']).size().reset_index(name='freq')

  group char  freq
0     A    B     3
1     A    X     3
2     B    B     3
3     B    C     1
4     B    F     1
5     B    S     5
6     B    X     1
like image 122
cs95 Avatar answered Jan 12 '23 18:01

cs95