Lemmatizing txt file and replacing only lemmatized words

Question

Having trouble figuring out how to lemmatize words from a txt file. I've gotten as far as listing the words, but I'm not sure how to lemmatize them after the fact.

Here's what I have:

import nltk, re
nltk.download('wordnet')
from nltk.stem.wordnet import WordNetLemmatizer

def lemfile():
    f = open('1865-Lincoln.txt', 'r')
    text = f.read().lower()
    f.close()
    text = re.sub('[^a-z\ \']+', " ", text)
    words = list(text.split())

cs95 · Accepted Answer

Initialise a WordNetLemmatizer object, and lemmatize each word in your lines. You can perform inplace file I/O using the fileinput module.

# https://stackoverflow.com/a/5463419/4909087
import fileinput

lemmatizer = WordNetLemmatizer()
for line in fileinput.input('1865-Lincoln.txt', inplace=True, backup='.bak'):
    line = ' '.join(
        [lemmatizer.lemmatize(w) for w in line.rstrip().split()]
    )
    # overwrites current `line` in file
    print(line)

fileinput.input redirects stdout to the open file when it is in use.

Lemmatizing txt file and replacing only lemmatized words

Tags:

python

nltk

lemmatization

ArchivistG

1 Answers

cs95

Recent Activity

Donate For Us

Lemmatizing txt file and replacing only lemmatized words

Tags:

python

nltk

lemmatization

ArchivistG

1 Answers

cs95

Related questions

Recent Activity

Donate For Us