Doing Multi-Label classification with BERT

Tags:

I want to use BERT model to do multi-label classification with Tensorflow.

To do so, I want to adapt the example run_classifier.py from BERT github repository, which is an example on how to use BERT to do simple classification, using the pre-trained weights given by Google Research. (For example with BERT-Base, Cased)

I have X different labels which have value of either 0 or 1, so I want to add to the original BERT model a new Dense layer of size X and using the sigmoid_cross_entropy_with_logits activation function.

So, for the theorical part I think I am OK.

The problem is that I don't know how I can append a new output layer and retrain only this new layer with my dataset, using the existing BertModel class.

Here is the original create_model() function from run_classifier.py where I guess I have to do my modifications. But I am a bit lost on what to do.

def create_model(bert_config, is_training, input_ids, input_mask, segment_ids,
                 labels, num_labels, use_one_hot_embeddings):
  """Creates a classification model."""
  model = modeling.BertModel(
      config=bert_config,
      is_training=is_training,
      input_ids=input_ids,
      input_mask=input_mask,
      token_type_ids=segment_ids,
      use_one_hot_embeddings=use_one_hot_embeddings)

  output_layer = model.get_pooled_output()

  hidden_size = output_layer.shape[-1].value

  output_weights = tf.get_variable(
      "output_weights", [num_labels, hidden_size],
      initializer=tf.truncated_normal_initializer(stddev=0.02))

  output_bias = tf.get_variable(
      "output_bias", [num_labels], initializer=tf.zeros_initializer())

  with tf.variable_scope("loss"):
    if is_training:
      # I.e., 0.1 dropout
      output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)

    logits = tf.matmul(output_layer, output_weights, transpose_b=True)
    logits = tf.nn.bias_add(logits, output_bias)
    probabilities = tf.nn.softmax(logits, axis=-1)
    log_probs = tf.nn.log_softmax(logits, axis=-1)

    one_hot_labels = tf.one_hot(labels, depth=num_labels, dtype=tf.float32)

    per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1)
    loss = tf.reduce_mean(per_example_loss)

    return (loss, per_example_loss, logits, probabilities)

And here is the same function, with some of my modifications, but where there is things missing (and wrong things too? )

def create_model(bert_config, is_training, input_ids, input_mask, segment_ids, labels, num_labels):
  """Creates a classification model."""
  model = modeling.BertModel(
      config=bert_config,
      is_training=is_training,
      input_ids=input_ids,
      input_mask=input_mask,
      token_type_ids=segment_ids)

  output_layer = model.get_pooled_output()

  hidden_size = output_layer.shape[-1].value

  output_weights = tf.get_variable("output_weights", [num_labels, hidden_size],initializer=tf.truncated_normal_initializer(stddev=0.02))

  output_bias = tf.get_variable("output_bias", [num_labels], initializer=tf.zeros_initializer())

  with tf.variable_scope("loss"):
    if is_training:
      # I.e., 0.1 dropout
      output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)

    logits = tf.matmul(output_layer, output_weights, transpose_b=True)
    logits = tf.nn.bias_add(logits, output_bias)
    probabilities = tf.nn.softmax(logits, axis=-1)
    log_probs = tf.nn.log_softmax(logits, axis=-1)

    per_example_loss = tf.nn.sigmoid_cross_entropy_with_logits(labels=labels, logits=logits)

    loss = tf.reduce_mean(per_example_loss)

    return (loss, per_example_loss, logits, probabilities)

The other things I have adapted in the code and for which I had no problem :

DataProcessor to load and parse my custom dataset
Changing the type of labels variable from numerical values to arrays everywhere it is used

So, if anyone knows what I should do to resolve my problem, or even point out some obvious mistake I may have done, I would be glad to hear it.

Notes :

I found this article that correspond pretty well to what I am trying to do, but it use PyTorch, and I can not translate it into Tensorflow.

787

asked May 06 '19 13:05

Nakeuh

1 Answers

You want to replace the softmax that models a single distribution over possible outputs (all scores sum up to one) with sigmoid which models an independent distribution for each class (there is yes/no distribution for each output).

So, you correctly change the loss function, but you also need to change how you compute the probabilities. It should be:

probabilities = tf.sigmoid(logits)

In this case, you don't need the log_probs.

125

answered Oct 22 '22 05:10

Jindřich

Related questions
                            
                                Can I force pip to make a shallow checkout when installing from git?
                            
                                Equivalent of thread.interrupt_main() in Python 3
                            
                                Numpy custom Cumsum function with upper/lower limits?
                            
                                How can I change the default font using in django admin interface?
                            
                                "ImportError: Cannot import name multiarray"
                            
                                django channels ImproperlyConfigured: Cannot find 'app' in ASGI_APPLICATION module
                            
                                check if letters of a string are in sequential order in another string
                            
                                Flask WTForms Integerfield type is text instead of number
                            
                                Python - Screenshot of background/inactive window
                            
                                Using numpy isin element-wise between 2D and 1D arrays
                            
                                Absolute import results in ModuleNotFoundError
                            
                                Configure AWS Cloud9 to use Anaconda Python Environment
                            
                                Keras network producing inverse predictions
                            
                                How to fill in the blank using bidirectional RNN and pytorch?
                            
                                QThread in Qt on Python
                            
                                Plotly: How to set heatmap aspect ratio?
                            
                                get instagram followers list with python
                            
                                How do you apply layer normalization in an RNN using tf.keras?
                            
                                Why stdbuf has no effect on Python?
                            
                                How to use estimateRigidTransform in OpenCV 3.0 or higher, Is there any other alternative?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Doing Multi-Label classification with BERT

Tags:

python

tensorflow

classification

multilabel-classification

Nakeuh

People also ask

1 Answers

Jindřich

Recent Activity

Donate For Us