Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: delete substring by indices

Tags:

python

string

I have the following rather simple snippet:

def delete_substring_blocks(s, blocks):                                                                             
  '''                                                                                                                   
  s: original input string                                                                                   
  blocks: list of indices (start, end) to be deleted                                                                
                                                                                                                        
  return string `out` where blocks are deleted from s                                                      
  '''                                                                                                                   
  out = ''                                                                                                              
  p = 0                                                                                                                 
  for start, end in blocks:                                                                                             
      out += s[p:start]                                                                                               
      p = end                                                                                                           
  out += s[p:]                                                                                                        
  return out

This function takes a string s and delete all s[start:end] from s, where pairs of indices (start, end) are given in a list blocks.

Is there a builtin function somewhere that does the same thing?


There is an assumption in my code:

blocks are sorted by the first index in ascending order (done by list.sort() in-place)

As for if the blocks can overlap, in my use case I ensure they don't before invoking the function. But for fun we can also assume they do.

like image 477
Patrick the Cat Avatar asked Sep 27 '22 00:09

Patrick the Cat


2 Answers

My approach transforms blocks into a set of indices which I call exclude. After that, loop through the string and exclude those characters whose index is in the exclude set. I am using set instead of list because it nicely handles the duplicates (in case of overlapping ranges).

Build the exclude set

Given an unordered, potentially overlapped list of ranges:

blocks = [(5, 7), (2, 4), (6, 10)]

I want to convert this into:

exclude = {2, 3, 5, 6, 7, 8, 9}

How:

exclude = set()
for block in blocks:
    exclude.update(range(*block))

Putting it all together

Here is my code and a little example at the end. Note that I chose to rename the function since this function is generic enough to work with string, list, tuple, and other iterable objects, not just strings. Also, because the function returns a list, when dealing with string, we need to join the list of characters back together.

def delete_blocks(iterable, blocks):                                                                             
    exclude = set()
    for block in blocks:
        exclude.update(range(*block))
    return [cell for i, cell in enumerate(iterable) if i not in exclude]

# Try it out
test_string = '0123456789abc'
blocks = [(5, 7), (2, 4), (6, 10)]
result = ''.join(delete_blocks(test_string, blocks))

print('Before: {!r}'.format(test_string))
print('Blocks:', blocks)
print('After: {!r}'.format(result))

Update: implement delete_substring_blocks

To really answer Mai's question, I implemented delete_substring_blocks using delete_blocks:

def delete_substring_blocks(s, blocks):
    return ''.join(delete_blocks(s, blocks))
like image 84
Hai Vu Avatar answered Sep 28 '22 12:09

Hai Vu


Since it is unspecified we must assume that the list of blocks may contain overlaps.

A relatively inefficient expression, but one that will handle overlapping and non-sorted blocks, is:

def delete_substring_blocks(s, blocks):
    return ''.join(
        [c for i, c in enumerate(s) 
         if not any(blk for blk in blocks 
                    if i >= blk[0] and i < blk[1])])

Here we simply test each character's position to see if it is within any of the block intervals and accept it if it is not.

Here is an example with an overlapping block:

>>> delete_substring_blocks(
        "hello there how are you", 
        [[0, 3], [7, 9], [7, 10]])
'lo te how are you'

As you appear to find this expression unreadable, here it is broken down a little more:

def delete_substring_blocks(s, blocks):
    def check_pos(i):
        return not any(1 for start, end in blocks 
                       if i >= start and i < end)

    return ''.join([c for i, c in enumerate(s) 
                    if check_pos(i)])
like image 45
donkopotamus Avatar answered Sep 28 '22 12:09

donkopotamus