Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

strip punctuation from a file list

Tags:

python

regex

list

3   3   how are you doing???
2   5   dear, where abouts!!!!!!........
4   6   don't worry i'll be there for ya///

I have a file with such type of sentences. I want to strip the punctuation from them. How can I loop and strip with regex.

>>> import re
>>> a="what is. your. name?"
>>> b=re.findall(r'\w+',a)
>>> b
['what', 'is', 'your', 'name']

I know to do only with 1 sentence but when it comes to a list like above then I get confused. I am new to python and regular expressions. It returns such type of error when I dont strip the punctutation in my sentences.

File "/usr/lib/python2.7/re.py", line 137, in match
    return _compile(pattern, flags).match(string)
  File "/usr/lib/python2.7/re.py", line 242, in _compile
    raise error, v # invalid expression
sre_constants.error: multiple repeat

EDiteD:The sentences is the 3rd column & delimiter is tab so how do I remove punctuation from 3rd column.

like image 766
The Third Avatar asked Dec 20 '25 19:12

The Third


1 Answers

Iterate lines using for loop:

with open('/path/to/file.txt') as f:
    for line in f:
        words = re.findall(r'\w+', line)
        # do something with words

with open('/path/to/file.txt') as f:
    for line in f:
        col1, col2, rest = line.split('\t', 2) # split into 3 columns
        words = re.findall(r'\w+', rest)
        line = '\t'.join(col1, col2, ' '.join(words))
        # do something with words or line
like image 84
falsetru Avatar answered Dec 22 '25 10:12

falsetru



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!