Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

delete first X words and delimiters of a string - with multiple delimiters

I have a string such as manipulate widgets add,1,2,3 (sorry, but I can't change the format).

I want to delete the first X words and any delimiters which preced them.

Let's take 3 as an example, thus deleting manipulate widgets add and leaving ,1,2,3

Or, take manipulate,widgets,add,1,2,3 delete two words (manipulate,widgets) and leave ,add,1,2,3

I can split the string into a list with words = re.split('[' + delimiters + ']',inputString.strip()) but I can't simply delete the fist X words

with, say,

for i in range(1, numWorsdToRemove):
            del words[0]

and then return ' '.join(words) because that gives me 1 2 3 4.

How can I do it and retain the original delimiters of the non-deleted words?

Just to make it interesting, the input string can contain multiple spaces or tabs between words; only one comma, but that might also have spaces pre/suc-ceeding it :

manipulate ,widgets add , 1, 2 , 3

Note that words are not guaranteed to be unique, so I can't take the index of the word after those to be deleted and use it to return a positional substring.


[Update] I accepted 'Kasramvd solution, but then found that it didn't correctly handle remover('LET FOUR = 2 + 2', 2) or remover('A -1 B text.txt', 2), so now I am offering abounty.


[Update++] delimiters are space, tab and comma. Everything else (including equals sign, minus sign, etc) is part of a word (although I would be happy if answerers would tell me how to add a new delimiter in future, should it become necessary)

like image 589
Mawg says reinstate Monica Avatar asked Jan 02 '26 03:01

Mawg says reinstate Monica


2 Answers

You can define a RegEx like this

>>> import re
>>> regEx = re.compile(r'(\s*,?\s*)')

it means that, an optional comma followed or preceded by zero or more whitespace characters. The parenthesis is to create a matching group, which would retain the separators during the split.

Now split based on the RegEx and then skip the actual number of elements you don't want, along with the number of separators corresponding to those elements (for example, if you want to skip three elements, then there will be two separators between three elements. So you would want to remove the first five elements from the split data) and finally join them.

For example,

>>> def splitter(data, count):
...     return "".join(re.split(regEx, data)[count + (count - 1):])
... 
>>> splitter("manipulate,widgets,add,1,2,3", 2)
',add,1,2,3'
>>> splitter("manipulate widgets add,1,2,3", 3)
',1,2,3'
like image 154
thefourtheye Avatar answered Jan 03 '26 15:01

thefourtheye


s1='manipulate widgets add,1,2,3'
# output desired ',1,2,3'
s2='manipulate,widgets,add,1,2,3'
# delete two words (manipulate,widgets) and leave ,add,1,2,3
s3='manipulate  ,widgets     add ,  1, 2  ,    3'
# delete 2 or 3 words

import re

# for illustration 
print re.findall('\w+',s1)
print re.findall('\w+',s2)
print re.findall('\w+',s3)
print


def deletewords(s,n):
    a= re.findall('\w+',s)
    return ','.join(a[n:])

# examples for use    
print deletewords(s1,1)   
print deletewords(s2,2)    
print deletewords(s3,3) 

output:

['manipulate', 'widgets', 'add', '1', '2', '3']
['manipulate', 'widgets', 'add', '1', '2', '3']
['manipulate', 'widgets', 'add', '1', '2', '3']

widgets,add,1,2,3
add,1,2,3
1,2,3
like image 27
roadrunner66 Avatar answered Jan 03 '26 15:01

roadrunner66



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!