The capabilities of Artificial Intelligence technology have improved rapidly over the past few years. Natural Language Processing (NLP), in particular, has seen incredible improvements as a result of the Transformer architecture and its derivates. Current state-of-the-art systems are capable of nearly human-level (or beyond human-level in some cases!) performance in many language tasks such as language generation, text classification, and question answering.
Similarly, current Speech Synthesis (Text-to-Speech/TTS) models can generate speech that is virtually indistinguishable from human speech. On the opposite end of the spectrum, Automatic Speech Recognition (ASR) is generally quite successful at understanding human speech, even accounting for…
mT5 is a multilingual Transformer model pre-trained on a dataset (mC4) containing text from 101 different languages. The architecture of the mT5 model (based on T5) is designed to support any Natural Language Processing task (classification, NER, question answering, etc.) by reframing the required task as a sequence-to-sequence task.
In other words — text goes in, and text comes out. For example, in a classification task, the input to the model can be the text sequence to be classified, and the output from the model will be the class label for the sequence. For translation, this is even more straight…
The original T5 (Text-To-Text Transfer Transformer) model achieved state-of-the-art performance on a variety of NLP benchmarks by leveraging a unified text-to-text format and a gigantic training dataset (C4). With the unified text-to-text approach, all downstream tasks were reframed such that both the input and the output of the model are text sequences. At a whopping 750 GB, the C4 (Colossal Clean Crawled Corpus) dataset was orders of magnitude larger than most existing datasets. Released back in October 2019 by Google, T5 still sits pretty at the top of the SuperGLUE benchmark as a testament to its capabilities.
More information regarding…
One of my favourite approaches to dealing with large, complex problems is to break them down into smaller, more manageable sub-problems. This makes it easier to focus on the important bits without being overwhelmed by the details. At Skil.AI, we often deal with enormous, complicated datasets which can seem quite daunting to tackle! As with most real-life datasets, our data is often messy, noisy, and generally the opposite of clean. In such cases, it can be helpful to break down the dataset into smaller chunks so that it can be analyzed and understood easily.
Sometimes, there may be obvious logical…
Have you ever searched for a contact on your phone and come up with several duplicate or near-duplicate entries? (I’m fairly certain this isn’t just me!) For me, this tends to happen when I forget that I already have a particular contact saved and create a new one for a new number. Duplicated contacts on a phone is a fairly minor annoyance and, despite my crappy memory, a fairly infrequent one at that.
However, for companies and organizations with huge databases of client information maintained by many different people, it is quite common to have multiple entries for the same…
BART is a denoising autoencoder for pretraining sequence-to-sequence models. BART is trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text.
Don’t worry if that sounds a little complicated; we are going to break it down and see what it all means. To add a little bit of background before we dive into BART, it’s time for the now-customary ode to Transfer Learning with self-supervised models. …
The goal of any Deep Learning model is to take in an input and generate the correct output. The nature of these inputs and outputs, which can vary wildly from application to application, depends on the specific job that the model should perform. For example, a dog breed classification model might take images as its input and generate the name of the dog breed (or a numeric label corresponding to the breed) as the output. Another model might accept a text description of a dog as its input and generate the name of the dog breed as its output. …
The T5 (Text-To-Text Transfer Transformer) model was the product of a large-scale study (paper) conducted to explore the limits of transfer learning. It builds upon popular architectures like GPT, BERT, and RoBERTa(to name only a few) models that utilized Transfer Learning with incredible success. While BERT-like models can be fine-tuned to perform a variety of tasks, the constraints of the architecture mean that each model can perform only one task.
Typically, this is done by adding a task-specific layer on top of the Transformer model. For example, a BERT Transformer can be adapted for binary classification by adding a fully-connected…
I’ve been itching to try the T5 (Text-To-Text Transfer Transformer) ever since it came out way, way back in October 2019 (it’s been a long couple of months). I messed around with open-sourced code from Google a couple of times, but I never managed to get it to work properly. Some of it went a little over my head (Tensorflow 😫 ) so I figured I’ll wait for Hugging Face to ride to the rescue! As always, the Transformers implementation is much easier to work with and I adapted it for use with Simple Transformers.
One of the “secrets” behind the success of Transformer models is the technique of Transfer Learning. In Transfer Learning, a model (in our case, a Transformer model) is pre-trained on a gigantic dataset using an unsupervised pre-training objective. This same model is then fine-tuned (typically supervised training) on the actual task at hand. The beauty of this approach is that the fine-tuning dataset can be as small as 500–1000 training samples! A number small enough to be potentially scoffed out of the room if one were to call it Deep Learning. This also means that the expensive and time-consuming part…