Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python removing delimiters from strings

Tags:

python

I have 2 related questions/ issues.

def remove_delimiters (delimiters, s):
    for d in delimiters:
        ind = s.find(d)
        while ind != -1:
            s = s[:ind] + s[ind+1:]
            ind = s.find(d)

    return ' '.join(s.split())


delimiters = [",", ".", "!", "?", "/", "&", "-", ":", ";", "@", "'", "..."]
d_dataset_list = ['hey-you...are you ok?']
d_list = []

for d in d_dataset_list:
    d_list.append(remove_delimiters(delimiters, d[1]))

print d_list

Output = 'heyyouare you ok'

  1. What is the best way of avoiding strings being combined together when a delimiter is removed? For example, so that the output is hey you are you ok ?

  2. There may be a number of different sequences of ..., for example .. or .......... etc. How does one go around implementing some form of rule, where if more than one . appear after each other, to remove it? I want to try and avoid hard-coding all sequences in my delimiters list. Thankyou

like image 295
user47467 Avatar asked Sep 13 '25 12:09

user47467


2 Answers

You could try something like this:

  1. Given delimiters d, join them to a regular expression

    >>> d = ",.!?/&-:;@'..."
    >>> "["+"\\".join(d)+"]"
    "[,\\.\\!\\?\\/\\&\\-\\:\\;\\@\\'\\.\\.\\.]"
    
  2. Split the string using this regex with re.split

    >>> s = 'hey-you...are you ok?'
    >>> re.split("["+"\\".join(d)+"]", s)
    ['hey', 'you', '', '', 'are you ok', '']
    
  3. Join all the non-empty fragments back together

    >>> ' '.join(w for w in re.split("["+"\\".join(d)+"]", s) if w)
    'hey you are you ok'
    

Also, if you just want to remove all non-word characters, you can just use the character group \W instead of manually enumerating all the delimiters:

>>> ' '.join(w for w in re.split(r"\W", s) if w)
'hey you are you ok'
like image 116
tobias_k Avatar answered Sep 15 '25 01:09

tobias_k


So first of all, your function for removing delimiters could be simplified greatly by using the replace function (http://www.tutorialspoint.com/python/string_replace.htm)

This would help solve your first question. Instead of just removing them, replace with a space, then get rid of the spaces using the pattern you already used (split() treats consecutive delimiters as one)

A better function, which does this, would be:

def remove_delimiters (delimiters, s):
    new_s = s
    for i in delimiters: #replace each delimiter in turn with a space
        new_s = new_s.replace(i, ' ')
    return ' '.join(new_s.split())

to answer your second question, I'd say it's time for regular expressions

>>> import re
... ss = 'hey ... you are ....... what?'
... print re.sub('[.+]',' ',ss)
hey     you are         what?
>>> 
like image 28
greg_data Avatar answered Sep 15 '25 03:09

greg_data