Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TensorflowJS text/string classification

Subject

Hello. I wanna implement text classification feature using Tensorflow.js in NodeJS.
Its job will be to match a string with some pre-defined topics.

Examples:

Input: String: "My dog loves walking on the beach"
Pre-defined topcics: Array<String>: ["dog", "cat", "cow"]
Output: There are many output variants I am comfortable with. These are some examples, but if you can suggest better, Do it!

  • String (the most likely topic) - Example: "dog"
  • Object (every topic with a predicted score)
    Example: {"dog": 0.9, "cat": 0.08, "cow": 0.02}

Research

I know similar results can be achieved by filtering the strings for the topic names and doing some algorithms but also can be achieved with ML.

There were already some posts about using strings, classifying text and creating autocomplete with TensorFlow (but not sure about TFjs), like these:

  • https://www.tensorflow.org/hub/tutorials/text_classification_with_tf_hub
  • http://ruder.io/text-classification-tensorflow-estimators/
  • https://machinelearnings.co/tensorflow-text-classification-615198df9231

How you can help

My goal is to do the topic prediction with TensorflowJS. I need just an example of the best way to train models with strings or how to classify text and then will extend the rest by myself.

like image 707
Radi Cho Avatar asked Oct 29 '22 07:10

Radi Cho


1 Answers

Text classification has an added challenge which is to first find the vectors from words. There are various approaches depending on the nature of the problem solved. Before building the model, one might ensure to have the vectors associated to all the words of the corpus. After the representation of a vector from the corpus suffers another issue of sparsity. Hence arises the need of word embedding. The two most popular algorithms for this task are Wor2Vec and GloVe. There are some implementations in js. Or one can create vectors using the bag of word as outlined here.

Once there are the vectors, a Fully Connected Neural Network FCNN will suffice to predict the topic of a text. The other things to take into consideration would be deciding the length of the text. In case a text is to short, there could be some padding, etc ... Here is a model

const model = tf.sequential();
model.add(tf.layers.dense({units: 100, activation: 'relu', inputShape: [lengthSentence]}));
model.add(tf.layers.dense({units: numTopics, activation: 'softmax'}));
model.compile({optimizer: 'sgd', loss: 'categoricalCrossentropy'});

Key Takeaways of the model

The model simply connects the input to the categorical output. It is a very simple model. But in some scenarios, adding an embedding layer after the input layer can be considered.

model.add(tf.layers.embedding({inputDim: inputDimSize, inputLength: lengthSentence, outputDim: embeddingDims}))

In some other case, an LSTM layer can be relevant

tf.layers.lstm({units: lstmUnits, returnSequences: true})
like image 69
edkeveked Avatar answered Nov 15 '22 12:11

edkeveked