This page shows the result of the test set I created by changing temporal phrase, labels are predicted by bert-base-uncased
, using huggingface BERT implementation. The accuracy is 23.75%
.
The pretrained BERT model is trained for 3 epochs, on MNLI, and get 84.16%
on matched dev set, 84.35%
on mismatched dev set.
I was a bit surprised BERT doesn’t get any better results compared to GLUE baseline models:
model | accuracy |
---|---|
CBOW | 34.8 |
BiLSTM | 22.9 |
ESIM | 19.4 |
I argue this technique should be used as a sanity check, for neural network models, to verify that these models, at least, understand basic concepts governing our universe. I cannot imagine any intelligent system that does not understand time goes forward.
For more details see my report