My dataframe is a long list of 4 letters, 'A', 'T', 'G','C'
, I need to count the frequency of each letter by index
df = pd.DataFrame({'cases': ['ACCTTGTAGTGTATTTTATGACCAAATGACTTTTTCCCCCCAGTGGCTAATTTGTCTCAGGCCTGCGTCTTAAAGAGACACGGTAATGAGTAGGAAGTCCAGCGTGGTCTGGA','ACCTTGTACTGTATCTTATGACCAGATGACTTTTTCCACCCAGTGGCTAATTTGTCTCAGGCCTCCGTCTTAAAGAGACACGGTAATGAGTAGGAAGTCCAACGTGGTCTAGA','GCCTTGTACTGTATATTATGACCAAATGACTTTTTCCACCCATTGGCTAATTTGTCTCAGGCCTCCGTCTTAAAGAGACACGGAAATGAGTAGGAAGTCCAGCGTGGTCTAGA','ACCTTGTACTGTATATTATGACCAGATGACTTTTTCCACCCAGTGGCTAATTTGTCTCAGGCCTCCGTCTTAAAGAGACACGGTAATGAGTAGGAAGTCCAGCGTGGTCTAGA']})
cases
0 ACCTTGTAGTGTATTTTATGACCAAATGACTTTTTCCCCCCAGTGG...
1 ACCTTGTACTGTATCTTATGACCAGATGACTTTTTCCACCCAGTGG...
2 GCCTTGTACTGTATATTATGACCAAATGACTTTTTCCACCCATTGG...
3 ACCTTGTACTGTATATTATGACCAGATGACTTTTTCCACCCAGTGG...
4 ACCTTGTACTGTATATTATGACCAGATGACTTTTTCCACCCAGTGG...
5 ACCTTGTAGTGTATTTTATGACCAAATGACTTTTTCCCCCCAGTGG...
6 ACCTTGTACTGTATCTTATGACCAGATGACTTTTTCCACCCAGTGG...
7 GCCTTGTACTGTATATTATGACCAAATGACTTTTTCCACCCATTGG...
8 ACCTTGTACTGTATATTATGACCAGATGACTTTTTCCACCCAGTGG...
9 ACCTTGTACTGTATATTATGACCAGATGACTTTTTCCACCCAGTGG...
The result would be a new df of shape 4x113
, i cannot figure out a pandas way to do this. Below is my non-pandas solution
def freq_lists(dna_list):
n = len(dna_list[0])
A = [0]*n
T = [0]*n
G = [0]*n
C = [0]*n
for dna in dna_list:
for index, base in enumerate(dna):
if base == 'A':
A[index] += 1
elif base == 'C':
C[index] += 1
elif base == 'G':
G[index] += 1
elif base == 'T':
T[index] += 1
return {'A': A, 'C': C, 'G': G, 'T': T}
fdf = pd.DataFrame(freq_lists(df['cases'].to_list()))
A C G T
0 3 0 1 0
1 0 4 0 0
2 0 4 0 0
3 0 0 0 4
4 0 0 0 4
.. .. .. .. ..
108 0 4 0 0
109 0 0 0 4
110 3 0 1 0
111 0 0 4 0
112 4 0 0 0
To clarify the first row is obtained by summing up the counts of the first str in the case
column which is AAGA -> A: 3, C:0, G:1 T:0
Let us do explode
with crosstab
s = df.cases.map(list).explode()
out = pd.crosstab(s.groupby(level=0).cumcount(),s)
Out[583]:
cases A C G T
row_0
0 3 0 1 0
1 0 4 0 0
2 0 4 0 0
3 0 0 0 4
4 0 0 0 4
.. .. .. ..
108 0 4 0 0
109 0 0 0 4
110 3 0 1 0
111 0 0 4 0
112 4 0 0 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With