I have long amino-acid strings that I would like to split based on start-stop values in a list. An example is probably the most clear way of explaining it:
str = "MSEPAGDVRQNPCGSKAC"
split_points = [[1,3], [7,10], [12,13]]
output >> ['M', '(SEP)', 'AGD', '(VRQN)', 'P', '(CG)', 'SKAC']
The extra parentheses are to show which elements were selected from the split_points list. I don't expect the start-stop points to ever overlap.
I have a bunch of ideas that would work, but seem terribly inefficient (code-length wise), and it seems like there must be a nice pythonic way of doing this.
Use a for loop to divide each element in a list. Use a for loop to iterate through each element in the list. Use the division operator / to divide by a number. Append the resultant quotients to a new list.
Strange way to split strings you have there:
def splitter( s, points ):
c = 0
for x,y in points:
yield s[c:x]
yield "(%s)" % s[x:y+1]
c=y+1
yield s[c:]
print list(splitter(str, split_points))
# => ['M', '(SEP)', 'AGD', '(VRQN)', 'P', '(CG)', 'SKAC']
# if some start and endpoints are the same remove empty strings.
print list(x for x in splitter(str, split_points) if x != '')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With