delete first X words and delimiters of a string - with multiple delimiters

Question

I have a string such as manipulate widgets add,1,2,3 (sorry, but I can't change the format).

I want to delete the first X words and any delimiters which preced them.

Let's take 3 as an example, thus deleting manipulate widgets add and leaving ,1,2,3

Or, take manipulate,widgets,add,1,2,3 delete two words (manipulate,widgets) and leave ,add,1,2,3

I can split the string into a list with words = re.split('[' + delimiters + ']',inputString.strip()) but I can't simply delete the fist X words

with, say,

for i in range(1, numWorsdToRemove):
            del words[0]

and then return ' '.join(words) because that gives me 1 2 3 4.

How can I do it and retain the original delimiters of the non-deleted words?

Just to make it interesting, the input string can contain multiple spaces or tabs between words; only one comma, but that might also have spaces pre/suc-ceeding it :

manipulate ,widgets add , 1, 2 , 3

Note that words are not guaranteed to be unique, so I can't take the index of the word after those to be deleted and use it to return a positional substring.

[Update] I accepted 'Kasramvd solution, but then found that it didn't correctly handle remover('LET FOUR = 2 + 2', 2) or remover('A -1 B text.txt', 2), so now I am offering abounty.

[Update++] delimiters are space, tab and comma. Everything else (including equals sign, minus sign, etc) is part of a word (although I would be happy if answerers would tell me how to add a new delimiter in future, should it become necessary)

thefourtheye · Accepted Answer

You can define a RegEx like this

>>> import re
>>> regEx = re.compile(r'(\s*,?\s*)')

it means that, an optional comma followed or preceded by zero or more whitespace characters. The parenthesis is to create a matching group, which would retain the separators during the split.

Now split based on the RegEx and then skip the actual number of elements you don't want, along with the number of separators corresponding to those elements (for example, if you want to skip three elements, then there will be two separators between three elements. So you would want to remove the first five elements from the split data) and finally join them.

For example,

>>> def splitter(data, count):
...     return "".join(re.split(regEx, data)[count + (count - 1):])
... 
>>> splitter("manipulate,widgets,add,1,2,3", 2)
',add,1,2,3'
>>> splitter("manipulate widgets add,1,2,3", 3)
',1,2,3'

roadrunner66 · Answer

s1='manipulate widgets add,1,2,3'
# output desired ',1,2,3'
s2='manipulate,widgets,add,1,2,3'
# delete two words (manipulate,widgets) and leave ,add,1,2,3
s3='manipulate  ,widgets     add ,  1, 2  ,    3'
# delete 2 or 3 words

import re

# for illustration 
print re.findall('\w+',s1)
print re.findall('\w+',s2)
print re.findall('\w+',s3)
print


def deletewords(s,n):
    a= re.findall('\w+',s)
    return ','.join(a[n:])

# examples for use    
print deletewords(s1,1)   
print deletewords(s2,2)    
print deletewords(s3,3)

output:

['manipulate', 'widgets', 'add', '1', '2', '3']
['manipulate', 'widgets', 'add', '1', '2', '3']
['manipulate', 'widgets', 'add', '1', '2', '3']

widgets,add,1,2,3
add,1,2,3
1,2,3

delete first X words and delimiters of a string - with multiple delimiters

Tags:

python

string

regex

split

Mawg says reinstate Monica

2 Answers

thefourtheye

roadrunner66

Recent Activity

Donate For Us

delete first X words and delimiters of a string - with multiple delimiters

Tags:

python

string

regex

split

Mawg says reinstate Monica

2 Answers

thefourtheye

roadrunner66

Related questions

Recent Activity

Donate For Us