Testing Encoders

I wanted to get a feel for myself of how well Masked Language Models perform at uncovering masks. This is a fairly trivial exercise, but I wanted to do it myself to get a better intuition of relative performance. Maybe it’ll be helpful to you as well, code on the bottom.

The Tasks: Answer these questions:

QuestionExpected Answer
The cat chased the [MASK].mouse
The capital of [MASK] is Paris.France
Bread is typically baked in an [MASK].oven
When it’s raining, you use an [MASK] to stay dry.umbrella
You brush your [MASK] to keep them clean.teeth
Money doesn’t grow on [MASK].trees
A doctor often works in a [MASK].hospital
You don’t put metal in a [MASK].microwave
When you’re sleepy, you go to [MASK].bed
The [MASK] is the center of our solar system.sun
People breathe [MASK] to live.oxygen
Plants need [MASK] to grow.water
The [MASK] is a natural satellite of the Earth.moon
Cars usually run on [MASK].gasoline
A fridge is used to keep food [MASK].cold
To see stars, you look at the [MASK].sky
Humans have [MASK] fingers on each hand.five
Fish live in [MASK].water
Fire needs [MASK] to burn.oxygen
Birds use [MASK] to fly.wings
The [MASK] rises in the East.sun

Models tested include BERT, Roberta, Electra, DeBERTa, XLM_Roberta_Base, BERT_Large_Cased, Legal_BERT, InfoXLM_Large, and Albert_Base_V1. The results are displayed in tables for each question, comparing the top 5 predicted tokens and their scores from each model.




I’m underwhelmed by the models’ performance. I’m sure I could’ve tweaked things further to get better outcomes, but I still expected more. Roberta did reasonably well, but still not too exciting.

Leave a Reply

Your email address will not be published. Required fields are marked *