I have a dataset with 2 columns that look like:
|group| |sequence|
A BX
A X
B SFS
B BCX
B BSS*B1S
A BBX
I'd like some way to be able to group and find the frequency of each character, to get something like this:
|group| |char| |freq|
A B 3
A X 3
B S 5
...
You could use an efficient repeat
-based solution followed by groupby
:
from itertools import chain
# Step 1 - flatten your dataframe
df = pd.DataFrame({
'group' : df['group'].repeat(df.sequence.str.len()),
'char' : list(chain.from_iterable(df.sequence.tolist()))
})
# Step 2 - filter out characters and groupby on `group`
df[df.char.str.isalpha()].groupby(['group', 'char']).size().reset_index(name='freq')
group char freq
0 A B 3
1 A X 3
2 B B 3
3 B C 1
4 B F 1
5 B S 5
6 B X 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With