Consider that I have a string that contains the python code.
input = "import nltk
from nltk.stem import PorterStemmer
porter_stemmer=PorterStemmer()
words=["connect","connected","connection","connections","connects"]
stemmed_words=[porter_stemmer.stem(word) for word in words]
stemmed_words"
How can I tokenize the code? I found the tokenize module (https://docs.python.org/3/library/tokenize.html). However, it is not clear to me how to use the module. It has tokenize.tokenize(readline) but the parameter takes a generator, not a string.
import tokenize
import io
inp = """import nltk
from nltk.stem import PorterStemmer
porter_stemmer=PorterStemmer()
words=["connect","connected","connection","connections","connects"]
stemmed_words=[porter_stemmer.stem(word) for word in words]
stemmed_words"""
for token in tokenize.generate_tokens(io.StringIO(inp).readline):
print(token)
tokenize.tokenize takes a method not a string. The method should be a readline method from an IO object.
In addition, tokenize.tokenize expects the readline method to return bytes, you can use tokenize.generate_tokens instead to use a readline method that returns strings.
Your input should also be in a docstring, as it is multiple lines long.
See io.TextIOBase, tokenize.generate_tokens for more info.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With