When I run the program it always prints true. For example, if I enter AAJJ it will print true because is only checking if the first letter is true. can someone point me in the right direction? Thanks!
squence_str = raw_input("Enter either A DNA, Protein or RNA sequence:")
def DnaCheck():
for i in (squence_str):
if string.upper(i) =="A":
return True
elif string.upper(i) == "T":
return True
elif string.upper(i) == "C":
return True
elif string.upper(i) == "G":
return True
else:
return False
print "DNA ", DnaCheck()
You need to check that all of the bases in the DNA sequence are valid.
def DnaCheck(sequence):
dna = set('ACTG')
return all(base.upper() in dna for base in sequence)
all(...)
uses a generator expression to iterate over all the nucleotides in the given DNA sequence, converting each into UPPER case and checking if it is contained in the DNA set {'A', 'C', 'T', 'G'}
. If any value is not in this set, the function immediately returns False
without processing the remaining characters in sequence
, otherwise the function returns True
once all characters have been processed and each is in the set.
For example, the sequence "axctgACTGACT" would return False
after only processing the first two characters in the sequence, as "x" converted to the uppercase "X" is not in the DNA set {'A','C', 'T', 'G'}
and thus the remaining characters in the sequence don't need to be checked.
I like @Alexander's answer, but for variety you could see if
def dna_check(sequence):
return set(sequence.upper()).issubset("ACGT")
# another possibility:
# return set(sequence).issubset("ACGTacgt")
might be faster on long sequences, especially if the odds of being a legal sequence are good (ie most of the time you will have to iterate over the whole sequence anyway).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With