Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cut string within a specific pattern in python

I have string of some length consisting of only 4 characters which are 'A,T,G and C'. I have pattern 'GAATTC' present multiple times in the given string. I have to cut the string at intervals where this pattern is.. For example for a string, 'ATCGAATTCATA', I should get output of

  • string one - ATCGA
  • string two - ATTCATA

I am newbie in using Python but I have come up with the following (incomplete) code:

seq = seq.upper()
str1 = "GAATTC"
seqlen = len(seq)
seq = list(seq)

for i in range(0,seqlen-1):
    site = seq.find(str1)
    print(site[0:(i+2)])

Any help would be really appreciated.

like image 306
Srk Avatar asked Apr 11 '26 01:04

Srk


2 Answers

First lets develop your idea of using find, so you can figure out your mistakes.

seq = 'ATCGAATTCATAATCGAATTCATAATCGAATTCATA'
seq = seq.upper()
pattern = "GAATTC"
split_at = 2
seqlen = len(seq)
i = 0

while i < seqlen:
    site = seq.find(pattern, i)
    if site != -1:
       print(seq[i: site + split_at])
       i = site + split_at
    else:
       print seq[i:]
       break

Yet python string sports a powerful replace method that directly replaces fragments of string. The below snippet uses the replace method to insert separators when needed:

seq = 'ATCGAATTCATAATCGAATTCATAATCGAATTCATA'
seq = seq.upper()
pattern = "GA","ATTC"
pattern1 = ''.join(pattern) # 'GAATTC'
pattern2 = ' '.join(pattern) # 'GA ATTC'
splited_seq = seq.replace(pattern1, pattern2) # 'ATCGA ATTCATAATCGA ATTCATAATCGA ATTCATA'
print (splited_seq.split())

I believe it is more intuitive and should be faster then RE (which might have lower performance, depending on library and usage)

like image 89
Serge Avatar answered Apr 12 '26 15:04

Serge


Here is a simple solution :

seq = 'ATCGAATTCATA'
seq_split = seq.upper().split('GAATTC')
result = [ 
    (seq_split[i]  + 'GA') if i % 2 == 0 else ('ATTC' + seq_split[i]) 
    for i in range(len(seq_split)) if len(seq_split[i]) > 0 
]

Result :

print(result)
['ATCGA', 'ATTCATA']
like image 28
t.m.adam Avatar answered Apr 12 '26 14:04

t.m.adam



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!