Tensorflow.js tokenizer

Tags:

I'm new to Machine Learning and Tensorflow, since I don't know python so I decide to use there javascript version (maybe more like a wrapper).

The problem is I tried to build a model that process the Natural Language. So the first step is tokenizer the text in order to feed the data to model. I did a lot research, but most of them are using python version of tensorflow that use method like: tf.keras.preprocessing.text.Tokenizer which I can't find similar in tensorflow.js. I'm stuck in this step and don't know how can I transfer text to vector that can feed to model. Please help :)

271

asked Aug 02 '18 22:08

Dacredible

2 Answers

To transform text to vectors, there are lots of ways to do it, all depending on the use case. The most intuitive one, is the one using the term frequency, i.e , given the vocabulary of the corpus (all the words possible), all text document will be represented as a vector where each entry represents the occurrence of the word in text document.

With this vocabulary :

["machine", "learning", "is", "a", "new", "field", "in", "computer", "science"]

the following text:

["machine", "is", "a", "field", "machine", "is", "is"]

will be transformed as this vector:

[2, 0, 3, 1, 0, 1, 0, 0, 0]

One of the disadvantage of this technique is that there might be lots of 0 in the vector which has the same size as the vocabulary of the corpus. That is why there are others techniques. However the bag of words is often referred to. And there is a slight different version of it using tf.idf

const vocabulary = ["machine", "learning", "is", "a", "new", "field", "in", "computer", "science"]
const text = ["machine", "is", "a", "field", "machine", "is", "is"] 
const parse = (t) => vocabulary.map((w, i) => t.reduce((a, b) => b === w ? ++a : a , 0))
console.log(parse(text))

There is also the following module that might help to achieve what you want

114

answered Oct 02 '22 10:10

edkeveked

Well, I faced this issue and handled it by following below steps:

After tokenizer.fit_on_texts([data]) print tokenizer.word_index in your python code.
copy and save the word_index output as json file.
Refer to this json object to generate tokenized words, like this: function getTokenisedWord(seedWord) { const _token = word2index[seedWord.toLowerCase()] return tf.tensor1d([_token]) }
Feed to model: const seedWordToken = getTokenisedWord('Hello'); model.predict(seedWordToken).data().then(predictions => { const resultIdx = tf.argMax(predictions).dataSync()[0]; console.log('Predicted Word ::', index2word[resultIdx]); })
index2word is the reverse mapping of word2index json object.

answered Oct 02 '22 10:10

Deepak P

Related questions
                            
                                Write a pure function to return one object from inner properties another objects?
                            
                                Load iframe into the page, using Chrome Extension content script
                            
                                Semantic react ui Popup close button
                            
                                Using Facebook advertising tracking pixels in Angular 2 / 4
                            
                                How does SonarQube calculate the overall coverage?
                            
                                google apps script: html - form submit, get input values
                            
                                How to get the video card driver name using javascript browser side?
                            
                                toLocaleString() is not working
                            
                                Vue trimming white spaces
                            
                                React - toggle in stateless component
                            
                                How do I alternate colors in Flat List (React Native)
                            
                                Javascript HTML5 video event canplay not firing on Safari
                            
                                Service worker - network first then cache with fallback to static page
                            
                                In TSX file : Property 'createRef' does not exist on type 'typeof React'
                            
                                Vue.js open modal by click of a button
                            
                                Compression webpack plugin
                            
                                What engine does WebAssembly use?
                            
                                React Native FlatList load more when we get to the bottom of the list
                            
                                Rendering sharp text as a three.js texture
                            
                                use css or js highlight text with half height [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Tensorflow.js tokenizer

Tags:

javascript

machine-learning

natural-language-processing

tensorflow.js

Dacredible

People also ask

2 Answers

edkeveked

Deepak P

Recent Activity

Donate For Us