AI researcher, avid reader, fantasy and Sci-Fi geek, and fan of the Oxford comma. www.linkedin.com/in/t-rajapakse/

The mT5 model is pre-trained on over a hundred different languages. Let’s see how we can leverage this to train a bilingual translation model for a low-resource language — Sinhalese.

Image for post
Image for post

mT5 is a multilingual Transformer model pre-trained on a dataset (mC4) containing text from 101 different languages. The architecture of the mT5 model (based on T5) is designed to support any Natural Language Processing task (classification, NER, question answering, etc.) by reframing the required task as a sequence-to-sequence task.

In other words — text goes in, and text comes out. For example, in a classification task, the input to the model can be the text sequence to be classified, and the output from the model will be the class label for the sequence. For translation, this is even more straight…


Cross-lingual, zero-shot training with mT5 — Training an mT5 model in English and using it with other languages!

Image for post
Image for post

The original T5 (Text-To-Text Transfer Transformer) model achieved state-of-the-art performance on a variety of NLP benchmarks by leveraging a unified text-to-text format and a gigantic training dataset (C4). With the unified text-to-text approach, all downstream tasks were reframed such that both the input and the output of the model are text sequences. At a whopping 750 GB, the C4 (Colossal Clean Crawled Corpus) dataset was orders of magnitude larger than most existing datasets. Released back in October 2019 by Google, T5 still sits pretty at the top of the SuperGLUE benchmark as a testament to its capabilities.

More information regarding…


Real-life data is messy, complex, and hard to understand! Let’s see how we can make it a little simpler!

Image for post
Image for post

One of my favourite approaches to dealing with large, complex problems is to break them down into smaller, more manageable sub-problems. This makes it easier to focus on the important bits without being overwhelmed by the details. At Skil.AI, we often deal with enormous, complicated datasets which can seem quite daunting to tackle! As with most real-life datasets, our data is often messy, noisy, and generally the opposite of clean. In such cases, it can be helpful to break down the dataset into smaller chunks so that it can be analyzed and understood easily.

Sometimes, there may be obvious logical…


John Smith. J Smith. Smith, John. How to find if this John is the same as that John!

Image for post
Image for post

Have you ever searched for a contact on your phone and come up with several duplicate or near-duplicate entries? (I’m fairly certain this isn’t just me!) For me, this tends to happen when I forget that I already have a particular contact saved and create a new one for a new number. Duplicated contacts on a phone is a fairly minor annoyance and, despite my crappy memory, a fairly infrequent one at that.

However, for companies and organizations with huge databases of client information maintained by many different people, it is quite common to have multiple entries for the same…


Paraphrasing is the act of expressing something using different words while retaining the original meaning. Let’s see how we can do it with BART, a Sequence-to-Sequence Transformer Model.

Image for post
Image for post

Introduction

BART is a denoising autoencoder for pretraining sequence-to-sequence models. BART is trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text.

- BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension -

Don’t worry if that sounds a little complicated; we are going to break it down and see what it all means. To add a little bit of background before we dive into BART, it’s time for the now-customary ode to Transfer Learning with self-supervised models. …


How to tune your hyperparameters with Simple Transformers for better Natural Langauge Processing.

Image for post
Image for post

The goal of any Deep Learning model is to take in an input and generate the correct output. The nature of these inputs and outputs, which can vary wildly from application to application, depends on the specific job that the model should perform. For example, a dog breed classification model might take images as its input and generate the name of the dog breed (or a numeric label corresponding to the breed) as the output. Another model might accept a text description of a dog as its input and generate the name of the dog breed as its output. …


The T5 Transformer can perform any NLP task. It can perform multiple tasks, at the same time, with the same model. Here’s how!

Image for post
Image for post

The T5 (Text-To-Text Transfer Transformer) model was the product of a large-scale study (paper) conducted to explore the limits of transfer learning. It builds upon popular architectures like GPT, BERT, and RoBERTa(to name only a few) models that utilized Transfer Learning with incredible success. While BERT-like models can be fine-tuned to perform a variety of tasks, the constraints of the architecture mean that each model can perform only one task.

Typically, this is done by adding a task-specific layer on top of the Transformer model. For example, a BERT Transformer can be adapted for binary classification by adding a fully-connected…


The T5 Transformer frames any NLP task as a text-to-text task enabling pre-trained models to easily learn new tasks. Let’s teach the old dog a new trick!

Image for post
Image for post

I’ve been itching to try the T5 (Text-To-Text Transfer Transformer) ever since it came out way, way back in October 2019 (it’s been a long couple of months). I messed around with open-sourced code from Google a couple of times, but I never managed to get it to work properly. Some of it went a little over my head (Tensorflow 😫 ) so I figured I’ll wait for Hugging Face to ride to the rescue! As always, the Transformers implementation is much easier to work with and I adapted it for use with Simple Transformers.

Before we get to the…


ELECTRA is the new kid on the block. Let’s take a look at how it stacks up against the old guard!

Image for post
Image for post

One of the “secrets” behind the success of Transformer models is the technique of Transfer Learning. In Transfer Learning, a model (in our case, a Transformer model) is pre-trained on a gigantic dataset using an unsupervised pre-training objective. This same model is then fine-tuned (typically supervised training) on the actual task at hand. The beauty of this approach is that the fine-tuning dataset can be as small as 500–1000 training samples! A number small enough to be potentially scoffed out of the room if one were to call it Deep Learning. This also means that the expensive and time-consuming part…


A guide on language generation and fine-tuning language generation Transformer models with Simple Transformers. It’s easier than you think!

Image for post
Image for post

Transformer models are now state-of-the-art in most, if not all, Natural Language Processing tasks. Personally, I find language generation to be one of the most intriguing out of the myriad NLP tasks. There’s almost something human in being able to generate text that is not only grammatically correct but also cohesive and meaningful.

Transformers have risen admirably to the challenge of language generation with many models capable of generating impressive sequences of text. Out of these, the GPT-2 model, released over a year ago by Open AI, remains one of the best at language generation.

GPT-2 is a large transformer-based…

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store