How to tokenize python code using the Tokenize module?

Question

Consider that I have a string that contains the python code.

input = "import nltk
 from nltk.stem import PorterStemmer
 porter_stemmer=PorterStemmer()
 words=["connect","connected","connection","connections","connects"]
 stemmed_words=[porter_stemmer.stem(word) for word in words]
 stemmed_words"

How can I tokenize the code? I found the tokenize module (https://docs.python.org/3/library/tokenize.html). However, it is not clear to me how to use the module. It has tokenize.tokenize(readline) but the parameter takes a generator, not a string.

Minion3665 · Accepted Answer

import tokenize
import io

inp = """import nltk
 from nltk.stem import PorterStemmer
 porter_stemmer=PorterStemmer()
 words=["connect","connected","connection","connections","connects"]
 stemmed_words=[porter_stemmer.stem(word) for word in words]
 stemmed_words"""

for token in tokenize.generate_tokens(io.StringIO(inp).readline):
 print(token)

tokenize.tokenize takes a method not a string. The method should be a readline method from an IO object. In addition, tokenize.tokenize expects the readline method to return bytes, you can use tokenize.generate_tokens instead to use a readline method that returns strings.

Your input should also be in a docstring, as it is multiple lines long.

See io.TextIOBase, tokenize.generate_tokens for more info.

How to tokenize python code using the Tokenize module?

Tags:

python-3.x

tokenize

Muhammad Asaduzzaman

1 Answers

Minion3665

Recent Activity

Donate For Us

How to tokenize python code using the Tokenize module?

Tags:

python-3.x

tokenize

Muhammad Asaduzzaman

1 Answers

Minion3665

Related questions

Recent Activity

Donate For Us