Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unable to train my keras model : (Data cardinality is ambiguous:)

I am using the bert-for-tf2 library to do a Multi-Class Classification problem. I created the model but training throws the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-25-d9f382cba5d4> in <module>()
----> 1 model.fit([INPUT_IDS,INPUT_MASKS,INPUT_SEGS], list(train.SECTION))

5 frames
/tensorflow-2.0.0/python3.6/tensorflow_core/python/keras/engine/data_adapter.py in 
__init__(self, x, y, sample_weights, batch_size, epochs, steps, shuffle, **kwargs)
243             label, ", ".join([str(i.shape[0]) for i in nest.flatten(data)]))
244       msg += "Please provide data which shares the same first dimension."
--> 245       raise ValueError(msg)
246     num_samples = num_samples.pop()
247 

ValueError: Data cardinality is ambiguous:
x sizes: 3
y sizes: 6102
Please provide data which shares the same first dimension.

I am referring the medium article called Simple BERT using TensorFlow 2.0 The git repo for the library bert-for-tf2 can be found here.

Please find the entire code here.

Here is a link to my colab notebook

Really appreciate your help!

like image 323
Amal Vijayan Avatar asked Dec 03 '19 11:12

Amal Vijayan


1 Answers

Had the same issue, dunno why number of inputs and outputs should be same, this error appears to be raised from one of the data adaptors when x.shape[0] != y.shape[0], in this case

x = [INPUT_IDS,INPUT_MASKS,INPUT_SEGS]
y = list(train.SECTION)

so instead of

model.fit([INPUT_IDS,INPUT_MASKS,INPUT_SEGS], list(train.SECTION))

try giving inputs and outputs in a dictionary with the layer names (check model summary (suitable names can be explicitly given as well)), worked for me

model.fit(
     {
     "input_word_ids": INPUT_IDS,
     "input_mask": INPUT_MASKS,
     "segment_ids": INPUT_SEGS,
     },
    {"dense_1": list(train.SECTION)}
)

please make sure that the inputs and outputs are numpy arrays, for ex: using np.asarray(), it looks for .shape attribute

like image 143
Yoganand Avatar answered Oct 21 '22 11:10

Yoganand