Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generate a three dimensional array in python by iterating through a list

I appreciate that this task is probably a bit ambitious given my level (or lack) of knowledge, but still.

I have a list of 16 character strings, about 3000 items long, where each character denotes another list of numbers. Not sure if I'm making that clear; what it actually is a list of 16-amino acid long peptides, where each of the amino acids (1 of 20) is representable by 5 numbers.

I want to iterate through that list (of peptides), and then for each character (amino acid) add the relevant 5 numbers (Atchley factors, if you're interested) to an array, making a 3 dimensional array, where my axes are: instance of peptide (3000) x amino acid within that peptide (16) x factors (5).

I'm incredibly out of my depth, so I'm not sure if what I've got is useful is helpful, but here it is (using numpy):

array = np.empty(shape=(len(peptides),16,5)

for i in peptides:

    for j in str(i):

(and at this point I tried a bunch of different things as I trawled the forums, ending with something a little like this, but I'm sure I've missed even what I was aiming for here)

    if j == 'A':    
            L16Afctrs = np.append([-0.59145974, -1.30209266, -0.7330651, 1.5703918, -0.14550842], axis=1)
    elif j == 'C':
            L16Afctrs = np.append([-1.34267179, 0.46542300, -0.8620345, -1.0200786, -0.25516894], axis=1)
    ...
    elif j == 'Y':
            L16Afctrs = np.append([0.25999617, 0.82992312, 3.0973596, -0.8380164, 1.51150958], axis=1)

Like I say, I'm honestly struggling, any help would be much appreciated.

Edit: clarification (hopefully)

I have a list of around 3000 different 16 character strings, where each character in those strings denotes a further 5 numbers.

I want to generate a 3 dimensional array or structure, whereby I can (eventually) plot those 5 numbers for a given position across all 3000 strings, by looking across a given plane in the 3 dimensional array (where the dimensions I envisage are; original string x 16 characters x 5 factors).

I'm currently in the process of making a dictionary of the different characters, relating to the post from @Winston, then trying to fold that into a 3d array.

Edit 2: Success!

Winston's fix works beautifully!

like image 414
jayemee Avatar asked Nov 04 '22 11:11

jayemee


1 Answers

Store your data in a dictionary:

DATA = {
    'A' : numpy.array([-0.59145974, -1.30209266, -0.7330651, 1.5703918, -0.14550842]),
    'B' : numpy.array([-1.34267179, 0.46542300, -0.8620345, -1.0200786, -0.25516894]),
    'D' : numpy.array([1.05015062, 0.30242411, -3.6559147, -0.2590236, -3.24176791])
    ...
}

Use a python list comprehension to build a list of all those, and then have numpy convert that list into a numpy array

counters = numpy.array([DATA[letter] for peptide in peptides for letter in peptide])

Reshape the array into your 3D dimensions, since the last step will have 2D arrays

counters = counters.reshape( len(peptides), 16, 5 )
like image 56
Winston Ewert Avatar answered Nov 08 '22 07:11

Winston Ewert