Thank you!

There are two concepts in further training a pre-trained Transformer model on a specific task.

  1. Pre-training (the terminology is a little confusing)
    Pre-training is the training procedure performed on a Transformer prior to adapting it to a particular task. In the case of BERT, this consists of training the model on a very, very large dataset using two pre-training objectives. Namely, masked word prediction and next sentence prediction.
  2. Fine-tuning (on a particular task)
    A suitable linear layer is added on top of the Transformer model and the entire model is trained on the required task (classification, NER, QA etc.)

For the vast majority of cases, you will only ever need to do the 2nd part. That is, you will take a pre-trained model and fine-tune it on your task with your own data.

AI researcher, avid reader, fantasy and Sci-Fi geek, and fan of the Oxford comma.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store