Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RNA Splicing Python

I have a gene sequence –

"acguccgcaagagaagccuuaauauauucaaaaagcuacgccucagauuucgcgcucgagcccaaaacaacugguguacggguugaucacaucaaaugaagucgcuaaagucggugaucucacuauccuugucuucggcuuuugcucucucggcuaucaucuaagcaggcgaguuccauggugaccggaacgacggcuacuggaguccaugaucgcaagcgucgggcugggguaaaagaggcucagcucauaauaguccgccccaccaguacgggacucgauaggccccgucguugccguagaaacgcaauuuuccucagacccacuauacgcaccucgauuuagcaugguuccgggguugcgcuuugagaaucauacguaaggaucggaaccuaggaaugcaccacagaacuuugaaauacuagaacaaguugauugacaacggaguaucggcgccccacauuuaacgaauaauugcaggcgccagacgaugcuaggugcguccguaucaagauucgaggucgcuacuggcuucgcuugccgaucgagcucagaguuugugagaguuguuacuaauugcguggucgccuaauauccuugauacuacguggguguacuagacaucccggacagaaaaucucuuaaacgcuagaguucucuuggaagcgccugcacuucuugugaacauacgaugauagccacucuaagcccaacgcacuucgcuuggcccacauugcccccagagcuuauucaucgacaggcguuccacucuuggauucaucaguaaacuuuauuauacgugguaagcgugcuuauagcugucggaaucucacuuaggcggauugaagugagacagccugaaaguaaccguguacaggcgccgucaauguguuuugagugugcaccuacaaaaaguguuauuuaggcaggggagcuuuguaguuucuuuagaagagccgcgaaugaaccaacgguagacugcgagcgcguucaaccuaau"

I want to splice the RNA and want to extract two lists (exons and introns). The key is that the intron section of RNA starts with gu and ends with ag. However, if ag appears before gu, it is a part of the exon and not the intron.

def splice(sequence):
    introns = list()
    exons = list()

    while(sequence.count("gu")):

        if "gu" not in sequence:
            break
        else:    

            exons.append(sequence[:sequence.find("gu")])
            sequence = sequence[sequence.find("gu"):]

        if "ag" not in sequence:
            break
        else:

            introns.append(sequence[:sequence.find("ag")+2])
            sequence = sequence[sequence.find("ag")+2:]

    return introns, exons

This is what I have so far. It goes well pretty far but the issue begins at the end when gu appears without an ag in the remaining string.

Output:

Exons:
['ac',
 'agaagccuuaauauauucaaaaagcuacgccucagauuucgcgcucgagcccaaaacaacug',
 'ucgcuaaa',
 'caggcga',
 'uccaugaucgcaagc',
 'aggcucagcucauaaua',
 'uacgggacucgauaggcccc',
 'aaacgcaauuuuccucagacccacuauacgcaccucgauuuagcaug',
 'aaucauac',
 'gaucggaaccuaggaaugcaccacagaacuuugaaauacuagaacaa',
 'uaucggcgccccacauuuaacgaauaauugcaggcgccagacgaugcuag',
 'auucgag',
 'cucaga',
 'a',
 'acaucccggacagaaaaucucuuaaacgcuaga',
 'cgccugcacuucuu',
 'ccacucuaagcccaacgcacuucgcuuggcccacauugcccccagagcuuauucaucgacaggc',
 'uaaacuuuauuauac',
 'c',
 'cu',
 'gcggauugaa',
 'acagccugaaa',
 'gcgcc',
 'u',
 'u',
 'gcaggggagcuuu',
 'uuucuuuagaagagccgcgaaugaaccaacg',
 'acugcgagcgc']

Introns:
['guccgcaag',
 'guguacggguugaucacaucaaaugaag',
 'gucggugaucucacuauccuugucuucggcuuuugcucucucggcuaucaucuaag',
 'guuccauggugaccggaacgacggcuacuggag',
 'gucgggcugggguaaaag',
 'guccgccccaccag',
 'gucguugccguag',
 'guuccgggguugcgcuuugag',
 'guaag',
 'guugauugacaacggag',
 'gugcguccguaucaag',
 'gucgcuacuggcuucgcuugccgaucgag',
 'guuugugag',
 'guuguuacuaauugcguggucgccuaauauccuugauacuacguggguguacuag',
 'guucucuuggaag',
 'gugaacauacgaugauag',
 'guuccacucuuggauucaucag',
 'gugguaag',
 'gugcuuauag',
 'gucggaaucucacuuag',
 'gugag',
 'guaaccguguacag',
 'gucaauguguuuugag',
 'gugcaccuacaaaaag',
 'guuauuuag',
 'guag',
 'guag']
like image 282
Ibtihaj Tahir Avatar asked Jun 10 '26 05:06

Ibtihaj Tahir


1 Answers

I fixed the query by using regular expressions.

def splice(gene_Sequence): 

    regex = r"gu(?:\w{0,}?)ag" 
    introns = re.findall(regex, gene_Sequence) 

    for intron in introns: 
        exon = gene_Sequence.replace(intron, "") 

    return introns, exon
like image 92
Ibtihaj Tahir Avatar answered Jun 11 '26 20:06

Ibtihaj Tahir



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!