Having trouble figuring out how to lemmatize words from a txt file. I've gotten as far as listing the words, but I'm not sure how to lemmatize them after the fact.
Here's what I have:
import nltk, re
nltk.download('wordnet')
from nltk.stem.wordnet import WordNetLemmatizer
def lemfile():
f = open('1865-Lincoln.txt', 'r')
text = f.read().lower()
f.close()
text = re.sub('[^a-z\ \']+', " ", text)
words = list(text.split())
Initialise a WordNetLemmatizer object, and lemmatize each word in your lines. You can perform inplace file I/O using the fileinput module.
# https://stackoverflow.com/a/5463419/4909087
import fileinput
lemmatizer = WordNetLemmatizer()
for line in fileinput.input('1865-Lincoln.txt', inplace=True, backup='.bak'):
line = ' '.join(
[lemmatizer.lemmatize(w) for w in line.rstrip().split()]
)
# overwrites current `line` in file
print(line)
fileinput.input redirects stdout to the open file when it is in use.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With