Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently partition a string at arbitrary index

Given an arbitrary string (i.e., not based on a pattern), say:

>>> string.ascii_letters
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

I am trying to partition a string based a list of indexes.

Here is what I tried, which does work:

import string

def split_at_idx(txt, idx):
    new_li=[None]*2*len(idx)
    new_li[0::2]=idx
    new_li[1::2]=[e for e in idx]
    new_li=[0]+new_li+[len(txt)]
    new_li=[new_li[i:i+2] for i in range(0,len(new_li),2)]  
    print(new_li)
    return [txt[st:end] for st, end in new_li]

print(split_at_idx(string.ascii_letters, [3,10,12,40]))  
# ['abc', 'defghij', 'kl', 'mnopqrstuvwxyzABCDEFGHIJKLMN', 'OPQRSTUVWXYZ']

The split is based on a list of indexes [3,10,12,40]. This list then needs to be transformed into a list of start, end pairs like [[0, 3], [3, 10], [10, 12], [12, 40], [40, 52]]. I used a slice assignment to set the evens and odds, then a list comprehension to group into pairs and a second LC to return the partitions.

This seems a little complex for such a simple function. Is there a better / more efficient / more idiomatic way to do this?

like image 860
dawg Avatar asked Dec 20 '13 04:12

dawg


1 Answers

I have a feeling someone asked this question very recently, but I can't find it now. Assuming that the dropped letters were an accident, couldn't you just do:

def split_at_idx(s, idx):
    return [s[i:j] for i,j in zip([0]+idx, idx+[None])]

after which we have

>>> split_at_idx(string.ascii_letters, [3, 10, 12, 40])
['abc', 'defghij', 'kl', 'mnopqrstuvwxyzABCDEFGHIJKLMN', 'OPQRSTUVWXYZ']
>>> split_at_idx(string.ascii_letters, [])
['abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ']
>>> split_at_idx(string.ascii_letters, [13, 26, 39])
['abcdefghijklm', 'nopqrstuvwxyz', 'ABCDEFGHIJKLM', 'NOPQRSTUVWXYZ']
like image 84
DSM Avatar answered Sep 22 '22 05:09

DSM