Explode column of strings and count character frequencies

Question

I have a dataset with 2 columns that look like:

|group| |sequence|
A        BX
A        X
B        SFS
B        BCX
B        BSS*B1S
A        BBX

I'd like some way to be able to group and find the frequency of each character, to get something like this:

 |group| |char| |freq|
 A       B       3
 A       X       3
 B       S       5
 ...

cs95 · Accepted Answer

You could use an efficient repeat-based solution followed by groupby:

from itertools import chain

# Step 1 - flatten your dataframe
df = pd.DataFrame({
    'group' : df['group'].repeat(df.sequence.str.len()), 
    'char' : list(chain.from_iterable(df.sequence.tolist()))
})
# Step 2 - filter out characters and groupby on `group`
df[df.char.str.isalpha()].groupby(['group', 'char']).size().reset_index(name='freq')

  group char  freq
0     A    B     3
1     A    X     3
2     B    B     3
3     B    C     1
4     B    F     1
5     B    S     5
6     B    X     1

Explode column of strings and count character frequencies

Tags:

python

string

pandas

dataframe

Justin

1 Answers

cs95

Recent Activity

Donate For Us

Explode column of strings and count character frequencies

Tags:

python

string

pandas

dataframe

Justin

1 Answers

cs95

Related questions

Recent Activity

Donate For Us