Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

clean line of punctuation and split into words python

learning python currently and having a bit of a problem. I'm trying to take a line from another subprogram and convert it into separate words that have been stripped of their punctuation besides a few. the output of this program is supposed to be the word and the line numbers it shows up on. Should look like this -> word: [1]

input file:

please. let! this3 work.
I: hope. it works
and don't shut up

Code:

    def createWordList(line):
        wordList2 =[]
        wordList1 = line.split()
        cleanWord = ""
        for word in wordList1: 
            if word != " ":
                for char in word:
                    if char in '!,.?":;0123456789':
                        char = ""
                    cleanWord += char
                    print(cleanWord," cleaned")
                wordList2.append(cleanWord)
         return wordList2

output:

anddon't:[3]
anddon'tshut:[3]
anddon'tshutup:[3]
ihope:[2]
ihopeit:[2]
ihopeitworks:[2]
pleaselet:[1]
pleaseletthis3:[1]
pleaseletthis3work:[1]

I'm unsure what this is caused by but I learned Ada and transitioning to python in a short period of time.

like image 223
Procrastinator Avatar asked Oct 26 '12 16:10

Procrastinator


People also ask

How do you split words and punctuation in Python?

findall() method to split a string into words and punctuation, e.g. result = re. findall(r"[\w'\"]+|[,.!?] ", my_str) . The findall() method will split the string on whitespace characters and punctuation and will return a list of the matches.

How do you split a line into a list of words in Python?

The simplest approach provided by Python to convert the given list of Sentences into words with separate indices is to use split() method. This method split a string into a list where each word is a list item.

How do you strip a string of punctuation in Python?

One of the easiest ways to remove punctuation from a string in Python is to use the str. translate() method. The translate() method typically takes a translation table, which we'll do using the . maketrans() method.

How do you split text in words in Python?

Python String split() Method A string can be split into substrings using the split(param) method. This method is part of the string object. The parameter is optional, but you can split on a specific string or character. Given a sentence, the string can be split into words.


2 Answers

Of course, you could also use a regular expression:

>>> import re
>>> s = """please. let! this3 work.
... I: hope. it works
... and don't shut up"""
>>> re.findall(r'[^\s!,.?":;0-9]+', s)
['please', 'let', 'this', 'work', 'I', 'hope', 'it', 'works', 'and', "don't", 
 'shut', 'up']
like image 58
Tim Pietzcker Avatar answered Sep 28 '22 07:09

Tim Pietzcker


You should set cleanWord back to an empty string at the top of each iteration of the outer loop:

def createWordList(line):
    wordList2 =[]
    wordList1 = line.split()
    for word in wordList1:
        cleanWord = ""
        for char in word:
            if char in '!,.?":;0123456789':
                char = ""
            cleanWord += char
        wordList2.append(cleanWord)
    return wordList2

Note that I also removed the if word != " ", since after line.split() you will never have spaces.

>>> createWordList('please. let! this3 work.')
['please', 'let', 'this', 'work']
like image 43
Andrew Clark Avatar answered Sep 28 '22 08:09

Andrew Clark