I have a gene sequence –
"acguccgcaagagaagccuuaauauauucaaaaagcuacgccucagauuucgcgcucgagcccaaaacaacugguguacggguugaucacaucaaaugaagucgcuaaagucggugaucucacuauccuugucuucggcuuuugcucucucggcuaucaucuaagcaggcgaguuccauggugaccggaacgacggcuacuggaguccaugaucgcaagcgucgggcugggguaaaagaggcucagcucauaauaguccgccccaccaguacgggacucgauaggccccgucguugccguagaaacgcaauuuuccucagacccacuauacgcaccucgauuuagcaugguuccgggguugcgcuuugagaaucauacguaaggaucggaaccuaggaaugcaccacagaacuuugaaauacuagaacaaguugauugacaacggaguaucggcgccccacauuuaacgaauaauugcaggcgccagacgaugcuaggugcguccguaucaagauucgaggucgcuacuggcuucgcuugccgaucgagcucagaguuugugagaguuguuacuaauugcguggucgccuaauauccuugauacuacguggguguacuagacaucccggacagaaaaucucuuaaacgcuagaguucucuuggaagcgccugcacuucuugugaacauacgaugauagccacucuaagcccaacgcacuucgcuuggcccacauugcccccagagcuuauucaucgacaggcguuccacucuuggauucaucaguaaacuuuauuauacgugguaagcgugcuuauagcugucggaaucucacuuaggcggauugaagugagacagccugaaaguaaccguguacaggcgccgucaauguguuuugagugugcaccuacaaaaaguguuauuuaggcaggggagcuuuguaguuucuuuagaagagccgcgaaugaaccaacgguagacugcgagcgcguucaaccuaau"
I want to splice the RNA and want to extract two lists (exons and introns). The key is that the intron section of RNA starts with gu and ends with ag. However, if ag appears before gu, it is a part of the exon and not the intron.
def splice(sequence):
introns = list()
exons = list()
while(sequence.count("gu")):
if "gu" not in sequence:
break
else:
exons.append(sequence[:sequence.find("gu")])
sequence = sequence[sequence.find("gu"):]
if "ag" not in sequence:
break
else:
introns.append(sequence[:sequence.find("ag")+2])
sequence = sequence[sequence.find("ag")+2:]
return introns, exons
This is what I have so far. It goes well pretty far but the issue begins at the end when gu appears without an ag in the remaining string.
Output:
Exons:
['ac',
'agaagccuuaauauauucaaaaagcuacgccucagauuucgcgcucgagcccaaaacaacug',
'ucgcuaaa',
'caggcga',
'uccaugaucgcaagc',
'aggcucagcucauaaua',
'uacgggacucgauaggcccc',
'aaacgcaauuuuccucagacccacuauacgcaccucgauuuagcaug',
'aaucauac',
'gaucggaaccuaggaaugcaccacagaacuuugaaauacuagaacaa',
'uaucggcgccccacauuuaacgaauaauugcaggcgccagacgaugcuag',
'auucgag',
'cucaga',
'a',
'acaucccggacagaaaaucucuuaaacgcuaga',
'cgccugcacuucuu',
'ccacucuaagcccaacgcacuucgcuuggcccacauugcccccagagcuuauucaucgacaggc',
'uaaacuuuauuauac',
'c',
'cu',
'gcggauugaa',
'acagccugaaa',
'gcgcc',
'u',
'u',
'gcaggggagcuuu',
'uuucuuuagaagagccgcgaaugaaccaacg',
'acugcgagcgc']
Introns:
['guccgcaag',
'guguacggguugaucacaucaaaugaag',
'gucggugaucucacuauccuugucuucggcuuuugcucucucggcuaucaucuaag',
'guuccauggugaccggaacgacggcuacuggag',
'gucgggcugggguaaaag',
'guccgccccaccag',
'gucguugccguag',
'guuccgggguugcgcuuugag',
'guaag',
'guugauugacaacggag',
'gugcguccguaucaag',
'gucgcuacuggcuucgcuugccgaucgag',
'guuugugag',
'guuguuacuaauugcguggucgccuaauauccuugauacuacguggguguacuag',
'guucucuuggaag',
'gugaacauacgaugauag',
'guuccacucuuggauucaucag',
'gugguaag',
'gugcuuauag',
'gucggaaucucacuuag',
'gugag',
'guaaccguguacag',
'gucaauguguuuugag',
'gugcaccuacaaaaag',
'guuauuuag',
'guag',
'guag']
I fixed the query by using regular expressions.
def splice(gene_Sequence):
regex = r"gu(?:\w{0,}?)ag"
introns = re.findall(regex, gene_Sequence)
for intron in introns:
exon = gene_Sequence.replace(intron, "")
return introns, exon
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With