Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lemmatizing txt file and replacing only lemmatized words

Having trouble figuring out how to lemmatize words from a txt file. I've gotten as far as listing the words, but I'm not sure how to lemmatize them after the fact.

Here's what I have:

import nltk, re
nltk.download('wordnet')
from nltk.stem.wordnet import WordNetLemmatizer

def lemfile():
    f = open('1865-Lincoln.txt', 'r')
    text = f.read().lower()
    f.close()
    text = re.sub('[^a-z\ \']+', " ", text)
    words = list(text.split())
like image 208
ArchivistG Avatar asked Jan 01 '26 00:01

ArchivistG


1 Answers

Initialise a WordNetLemmatizer object, and lemmatize each word in your lines. You can perform inplace file I/O using the fileinput module.

# https://stackoverflow.com/a/5463419/4909087
import fileinput

lemmatizer = WordNetLemmatizer()
for line in fileinput.input('1865-Lincoln.txt', inplace=True, backup='.bak'):
    line = ' '.join(
        [lemmatizer.lemmatize(w) for w in line.rstrip().split()]
    )
    # overwrites current `line` in file
    print(line)

fileinput.input redirects stdout to the open file when it is in use.

like image 64
cs95 Avatar answered Jan 03 '26 18:01

cs95



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!