Hey Tanya,

The time taken for finetuning depends heavily on the hardware used and the size of the training data. With BERT (and similar transformers), all layers are finetuned. This gives the best performance.

There is a fully-connected linear layer on top of the BERT layers which gives the final output for a specific task.

AI researcher, avid reader, fantasy and Sci-Fi geek, and fan of the Oxford comma. www.linkedin.com/in/t-rajapakse/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store