Vocabulary Processor function

Question

I am researching about embedding input for Convolution Neural Network and I understand Word2vec. However, in CNN text classification. dennybritz used function learn.preprocessing.VocabularyProcessor. In the document. They said it Maps documents to sequences of word ids. I am not quite sure how this function work. Does it creates a list of Ids then maps the Ids with Words or It has an dictionary of words and their Ids, when run function it only give the ids ?

Kashyap · Accepted Answer

Lets say that you have just two documents I like pizza and I like Pasta. Your whole vocabulary consists of these words (I, like, pizza, pasta) For every word in the vocabulary, there is an index associated like so (1, 2, 3, 4). Now given a document like I like pasta it can be converted into a vector [1, 2, 4]. This is what the learn.preprocessing.VocabularyProcessor does. The parameter max_document_length makes sure that all the documents are represented by a vector of length max_document_length either by padding numbers if their length is shorter than max_document_length and clipping them if their length is greater than max_document_length Hope this helps you

Vocabulary Processor function

Tags:

python

tensorflow

text-classification

ngoduyvu

1 Answers

Kashyap

Recent Activity

Donate For Us

Vocabulary Processor function

Tags:

python

tensorflow

text-classification

ngoduyvu

1 Answers

Kashyap

Related questions

Recent Activity

Donate For Us