Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python, divide string into several substrings

I have a string of RNA i.e:

AUGGCCAUA

I would like to generate all substrings by the following way:

#starting from 0 character
AUG, GCC, AUA
#starting from 1 character
UGG, CCA
#starting from 2 character
GGC, CAU

I wrote a code that solves the first sub-problem:

for i in range(0,len(rna)):
  if fmod(i,3)==0:
    print rna[i:i+3]

I have tried to change the starting position i.e.:

 for i in range(1,len(rna)):

But it produces me the incorrect results:

 GCC, UA #instead of UGG, CCA

Could you please give me a hint where is my mistake?

like image 315
mr.M Avatar asked Nov 10 '13 17:11

mr.M


People also ask

How do I split a string into multiple substrings in Python?

Python String split() Method Syntax separator : This is a delimiter. The string splits at this specified separator. If is not provided then any white space is a separator. maxsplit : It is a number, which tells us to split the string into maximum of provided number of times.

How do I split a string into a list of substrings?

Python String split() MethodThe split() method splits a string into a list. You can specify the separator, default separator is any whitespace. Note: When maxsplit is specified, the list will contain the specified number of elements plus one.

What divides a string into an array of substrings?

The split() method splits a string into an array of substrings. The split() method returns the new array. The split() method does not change the original string. If (" ") is used as separator, the string is split between words.


1 Answers

The problem with your code is that you are always extracting substring from the index which is divisible by 3. Instead, try this

a = 'AUGGCCAUA'
def getSubStrings(RNA, position):
    return [RNA[i:i+3] for i in range(position, len(RNA) - 2, 3)]

print getSubStrings(a, 0)
print getSubStrings(a, 1)
print getSubStrings(a, 2)

Output

['AUG', 'GCC', 'AUA']
['UGG', 'CCA']
['GGC', 'CAU']

Explanation

range(position, len(RNA) - 2, 3) will generate a list of numbers with common difference 3, starting from the position till the length of the list - 2. For example,

print range(1, 8, 3)

1 is the starting number, 8 is the last number, 3 is the common difference and it will give

[1, 4, 7]

These are our starting indices. And then we use list comprehension to generate the new list like this

[RNA[i:i+3] for i in range(position, len(RNA) - 2, 3)]
like image 146
thefourtheye Avatar answered Oct 03 '22 04:10

thefourtheye