Biologists use a sequence of letters A, C, T, and G to model a genome. A gene is a substrsing of a genome that starts after a triplet ATG and ends before a triplet TAG, TAA, or TGA. Furthermore, the length of a gene string is a multiple of 3 and the gene does not contain any of the triplets ATG, TAG, TAA, and TGA.
Ideally:
Enter a genome string: TTATGTTTTAAGGATGGGGCGTTAGTT #Enter
TTT
GGGCGT
-----------------
Enter a genome string: TGTGTGTATAT
No Genes Were Found
So far, I have:
def findGene(gene):
final = ""
genep = gene.split("ATG")
for part in genep:
for chr in part:
for i in range(0, len(chr)):
if genePool(chr[i:i + 3]) == 1:
break
else:
final += (chr[i+i + 3] + "\n")
return final
def genePool(part):
g1 = "ATG"
g2 = "TAG"
g3 = "TAA"
g4 = "TGA"
if (part.count(g1) != 0) or (part.count(g2) != 0) or (part.count(g3) != 0) or (part.count(g4) != 0):
return 1
def main():
geneinput = input("Enter a genome string: ")
print(findGene(geneinput))
main()
# TTATGTTTTAAGGATGGGGCGTTAGTT
I keep running into errors
To be completely honest, this is really not working for me - I think I have hit a dead end with these lines of code - a new approach may be helpful.
Thanks in advance!
The error that I have been getting -
Enter a genome string: TTATGTTTTAAGGATGGGGCGTTAGTT
Traceback (most recent call last):
File "D:\Python\Chapter 8\Bioinformatics.py", line 40, in <module>
main()
File "D:\Python\Chapter 8\Bioinformatics.py", line 38, in main
print(findGene(geneinput))
File "D:\Python\Chapter 8\Bioinformatics.py", line 25, in findGene
final += (chr[i+i + 3] + "\n")
IndexError: string index out of range
Like I said before, I'm not really sure if I am on the right track to solve the issue with my current code - any new ideas w/ pseudo code is appreciated!
This can be done with a regular expression:
import re
pattern = re.compile(r'ATG((?:[ACTG]{3})+?)(?:TAG|TAA|TGA)')
pattern.findall('TTATGTTTTAAGGATGGGGCGTTAGTT')
pattern.findall('TGTGTGTATAT')
Output
['TTT', 'GGGCGT'] []
Explanation extracted from https://regex101.com/r/yI4tN9/3
"ATG((?:[ACTG]{3})+?)(?:TAG|TAA|TGA)"g
ATG matches the characters ATG literally (case sensitive)
1st Capturing group ((?:[ACTG]{3})+?)
(?:[ACTG]{3})+? Non-capturing group
Quantifier: +? Between one and unlimited times, as few times as possible, expanding as needed [lazy]
[ACTG]{3} match a single character present in the list below
Quantifier: {3} Exactly 3 times
ACTG a single character in the list ACTG literally (case sensitive)
(?:TAG|TAA|TGA) Non-capturing group
1st Alternative: TAG
TAG matches the characters TAG literally (case sensitive)
2nd Alternative: TAA
TAA matches the characters TAA literally (case sensitive)
3rd Alternative: TGA
TGA matches the characters TGA literally (case sensitive)
g modifier: global. All matches (don't return on first match)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With