Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Doing Multi-Label classification with BERT

I want to use BERT model to do multi-label classification with Tensorflow.

To do so, I want to adapt the example run_classifier.py from BERT github repository, which is an example on how to use BERT to do simple classification, using the pre-trained weights given by Google Research. (For example with BERT-Base, Cased)

I have X different labels which have value of either 0 or 1, so I want to add to the original BERT model a new Dense layer of size X and using the sigmoid_cross_entropy_with_logits activation function.

So, for the theorical part I think I am OK.

The problem is that I don't know how I can append a new output layer and retrain only this new layer with my dataset, using the existing BertModel class.

Here is the original create_model() function from run_classifier.py where I guess I have to do my modifications. But I am a bit lost on what to do.

def create_model(bert_config, is_training, input_ids, input_mask, segment_ids,
                 labels, num_labels, use_one_hot_embeddings):
  """Creates a classification model."""
  model = modeling.BertModel(
      config=bert_config,
      is_training=is_training,
      input_ids=input_ids,
      input_mask=input_mask,
      token_type_ids=segment_ids,
      use_one_hot_embeddings=use_one_hot_embeddings)

  output_layer = model.get_pooled_output()

  hidden_size = output_layer.shape[-1].value

  output_weights = tf.get_variable(
      "output_weights", [num_labels, hidden_size],
      initializer=tf.truncated_normal_initializer(stddev=0.02))

  output_bias = tf.get_variable(
      "output_bias", [num_labels], initializer=tf.zeros_initializer())

  with tf.variable_scope("loss"):
    if is_training:
      # I.e., 0.1 dropout
      output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)

    logits = tf.matmul(output_layer, output_weights, transpose_b=True)
    logits = tf.nn.bias_add(logits, output_bias)
    probabilities = tf.nn.softmax(logits, axis=-1)
    log_probs = tf.nn.log_softmax(logits, axis=-1)

    one_hot_labels = tf.one_hot(labels, depth=num_labels, dtype=tf.float32)

    per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1)
    loss = tf.reduce_mean(per_example_loss)

    return (loss, per_example_loss, logits, probabilities)

And here is the same function, with some of my modifications, but where there is things missing (and wrong things too? )

def create_model(bert_config, is_training, input_ids, input_mask, segment_ids, labels, num_labels):
  """Creates a classification model."""
  model = modeling.BertModel(
      config=bert_config,
      is_training=is_training,
      input_ids=input_ids,
      input_mask=input_mask,
      token_type_ids=segment_ids)

  output_layer = model.get_pooled_output()

  hidden_size = output_layer.shape[-1].value

  output_weights = tf.get_variable("output_weights", [num_labels, hidden_size],initializer=tf.truncated_normal_initializer(stddev=0.02))

  output_bias = tf.get_variable("output_bias", [num_labels], initializer=tf.zeros_initializer())

  with tf.variable_scope("loss"):
    if is_training:
      # I.e., 0.1 dropout
      output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)

    logits = tf.matmul(output_layer, output_weights, transpose_b=True)
    logits = tf.nn.bias_add(logits, output_bias)
    probabilities = tf.nn.softmax(logits, axis=-1)
    log_probs = tf.nn.log_softmax(logits, axis=-1)

    per_example_loss = tf.nn.sigmoid_cross_entropy_with_logits(labels=labels, logits=logits)

    loss = tf.reduce_mean(per_example_loss)

    return (loss, per_example_loss, logits, probabilities)

The other things I have adapted in the code and for which I had no problem :

  • DataProcessor to load and parse my custom dataset
  • Changing the type of labels variable from numerical values to arrays everywhere it is used

So, if anyone knows what I should do to resolve my problem, or even point out some obvious mistake I may have done, I would be glad to hear it.

Notes :

  • I found this article that correspond pretty well to what I am trying to do, but it use PyTorch, and I can not translate it into Tensorflow.
like image 787
Nakeuh Avatar asked May 06 '19 13:05

Nakeuh


People also ask

Can Bert be used for multi-label Text Classification?

In this article, we will focus on application of BERT to the problem of multi-label text classification. Traditional classification task assumes that each document is assigned to one and only on class i.e. label. This is sometimes termed as multi-class classification or sometimes if the number of classes are 2, binary classification.

Can I use consumer complaint database for multi-label text classification task?

Looking for text data I could use for a multi-label multi-class text classification task, I stumbled upon the ‘Consumer Complaint Database’ from data.gov. Seems to do the trick, so that’s what we’ll use. Next up is the exploratory data analysis.

Can I use a Bert model from transformer in TensorFlow?

In doing so, you’ll learn how to use a BERT model from Transformer as a layer in a Tensorflow model built using the Keras API. The internet is full of text classification articles, most of which are BoW-models combined with some kind of ML-model typically solving a binary text classification problem.

What libraries do I need to build a Bert model?

The main libraries we need are a) Hugging Face Transformers (for BERT Model and Tokenizer), b) PyTorch (DL framework & Dataset prep), c) PyTorch Lightning (Model Definition and Training), d) Sklearn (for splitting dataset & metrics) and e) BeautifulSoup (for removing out HTML tags from the raw text in the given data).


1 Answers

You want to replace the softmax that models a single distribution over possible outputs (all scores sum up to one) with sigmoid which models an independent distribution for each class (there is yes/no distribution for each output).

So, you correctly change the loss function, but you also need to change how you compute the probabilities. It should be:

probabilities = tf.sigmoid(logits)

In this case, you don't need the log_probs.

like image 125
Jindřich Avatar answered Oct 22 '22 05:10

Jindřich