Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting codons (base 64) to a base 10 number

Tags:

python

numbers

In the July 2012 issue of "Mensa Bulletin" there is an article entitled "The Digital Brain." In it the author relates the human brain to base64 computing. It is a rather interesting and fun article with a prompt at the end. Said prompt asks the reader to convert Cytosine Guanine Adenine Guanine Adenine Guanine to a base 10 number using the fact that Cytosine Cytosine Guanine Cytosine Adenine Guanine equals 2011 (the first codon set mentioned is cgagag for short and the second is ccgcag for short.) Basically you have to convert a base 64 number to base 10 using a table in the article that displays all of the possible codons in proper order with aug = 0, uuu = 1, uuc = 2, ... , gga == 61, ggg = 62, uag = 63. I decided to give this a go and settled on writing a python program to convert codon numbers to base 10 and base 10 numbers to codons. After writing a quick algorithm for both, I ran it. The program gave no errors and popped out codons for my numbers and vice versa. However, they were the wrong numbers! I can not seem to see what is going wrong and would greatly appreciate any help.

Without further ado, the code:

codons = ['aug', 'uuu', 'uuc', 'uua', 'uug', 'ucu', 'ucc', 'uca', 'ucg', 'uau', 'uac', 'uaa', 'ugu', 'ugc', 'uga', 'ugg', 'cuu', 'cuc', 'cua', 'cug', 'ccu', 'ccc', 'cca', 'ccg', 'cau', 'cac', 'caa', 'cag', 'cgu', 'cgc', 'cga', 'cgg', 'auu', 'auc', 'aua', 'acu', 'acc', 'aca', 'acg', 'aau', 'aac', 'aaa', 'aag', 'agu', 'agc', 'aga', 'agg', 'guu', 'guc', 'gua', 'gug', 'gcu', 'gcc', 'gca', 'gcg', 'gau', 'gac', 'gaa', 'gag', 'ggu', 'ggc', 'gga', 'ggg', 'uag' ]

def codonNumToBase10 ( codonValue ) :

    numberOfChars = len( codonValue )

    # check to see if contains sets of threes
    if len( codonValue ) % 3 != 0 :
        return -1

    # check to see if it contains the correct characters
    for i in range(0, numberOfChars ) :
        if codonValue[i] != 'a' :
            if codonValue[i] != 'u' :
                if codonValue[i] != 'c' :
                    if codonValue[i] != 'g' :
                        return -2

    # populate an array with decimal versions of each codon in the input
    codonNumbers = []
    base10Value = 0
    numberOfCodons = int(numberOfChars / 3 )
    for i in range(0, numberOfCodons) :
        charVal = codonValue[ 0 + (i*3) ] + codonValue[ 1 + (i*3) ] + codonValue[ 2 + (i*3) ]
        val = 0
        for j in codons :
            if j == charVal :
                codonNumbers.append( val )
                break
            val += 1
        base10Value += ( pow( 64, numberOfCodons - i - 1 ) ) * codonNumbers[i]

    return base10Value

def base10ToCodonNum ( number ) :
    codonNumber = ''
    hitZeroCount = 0
    while( 1==1 ) :
        val = number % 64
        number = int( number / 64 )
        codonNumber = codons[val] + codonNumber
        if number == 0 :
            if hitZeroCount > 0:
                break
            hitZeroCount += 1
    return codonNumber

val_2011 = 'ccgcag'
val_unknown = 'cgagag'

print( base10ToCodonNum( codonNumToBase10( val_2011 ) ), '::', codonNumToBase10( val_2011 ) )
print( base10ToCodonNum( codonNumToBase10( val_unknown ) ), '::', codonNumToBase10( val_unknown ) )

EDIT 1: The values I am getting are 1499 for ccgcag and 1978 for cgagag.

EDIT 2: base10ToCodonNum function fixed thanks to Ashwini Chaudhary.

like image 684
Sonryell Avatar asked Jul 06 '12 06:07

Sonryell


1 Answers

I could not follow your code, so I made another implementation, but I got the same results:

CODONS = [
    'aug', 'uuu', 'uuc', 'uua', 'uug', 'ucu', 'ucc', 'uca',
    'ucg', 'uau', 'uac', 'uaa', 'ugu', 'ugc', 'uga', 'ugg',
    'uuu', 'cuc', 'cua', 'cug', 'ccu', 'ccc', 'cca', 'ccg',
    'cau', 'cac', 'caa', 'cag', 'cgu', 'cgc', 'cga', 'cgg',
    'auu', 'auc', 'aua', 'acu', 'acc', 'aca', 'acg', 'aau',
    'aac', 'aaa', 'aag', 'agu', 'agc', 'aga', 'agg', 'guu',
    'guc', 'gua', 'gug', 'gcu', 'gcc', 'gca', 'gcg', 'gau',
    'gac', 'gaa', 'gag', 'ggu', 'ggc', 'gga', 'ggg', 'uag',
]

def codon2decimal(s):
    if len(s) % 3 != 0:
        raise ValueError("%s doesn't look like a codon number." % s)
    digits = reversed([ s[i*3:i*3+3] for i in range(len(s)/3) ])
    val = 0
    for i, digit in enumerate(digits):
        if digit not in CODONS:
            raise ValueError("invalid sequence: %s." % digit)
        val += CODONS.index(digit) * 64 ** i
    return val

def main():
    for number in ('cggcag', 'ccgcag', 'cgagag', 'auguuuuuc'):
        print number, ':', codon2decimal(number)

if __name__ == '__main__':
    main()

results:

cggcag : 2011
ccgcag : 1499
cgagag : 1978
auguuuuuc : 66
like image 166
Paulo Scardine Avatar answered Oct 03 '22 06:10

Paulo Scardine