Email Spam Detection using Pre-Trained BERT Model : Part 2 - Model Fine Tuning
Recently I have been looking into Transformer based machine learning models for natural language tasks. The field of NLP has changed tremendously in the last few years and I have been fascinated by the new architectures and tools that come out at the same time. Transformer models are one such architecture.
As the frameworks and tools to build transformer models keep evolving, the documentation often becomes stale and blog posts are often confusing. So for any one topic, you may find multiple approaches which can confuse beginners.
So as I am learning these models, I am planning to document the steps to do a few of the essential tasks in the simplest way possible. This should help any beginner like me to pick up transformer models.
In this two-part series, I will be discussing how to train a simple model for email spam classification using a pre-trained transformer BERT model. This is the second post in the series where I will be discussing fine-tuning the model for spam detection. You can read all the posts in the series here.
Data Preparation and Tokenization
Please make sure you have gone through the first part of the series where we discussed about how to prepare our data using bert tokenization. You can find the same in the below link.
Email Spam Detection using Pre-Trained BERT Model: Part 1 - Introduction and Tokenization.
Model Fine Tuning
Once the tokenization is done, we are now ready to fine-tune the model.
A pre-trained model comes with a body and head. In most of the use cases, we only retrain the head part of the model. So that’s why we call it fine-tuning rather than retraining. You can read more about the head and body of a transformer model at the below link.
As we did with the tokenizer, we will download the model using hugging face library.
The above downloads a dummy sequence classification model head which needs to be tuned with data.
2. Training Arguments
Training arguments are where you set various options for given model training. For simplicity, we are going to use default ones.
3. Evaluation Metrics
For our training, we are going to use accuracy as an evaluation metric. The below code sets up a method to calculate the same from the model.
In the above code, np.argmax line converts logits returned from model prediction to labels so that, they can be compared with actual labels.
Let’s create trainer with below code.
Trainer API of hugging face handles all the batching and looping needed for fine-tuning the model.
5. Run the Train
Once trainer object is created, we can run the train the model using train method call.
Find Accuracy on Testing Dataset
Once the model is trained, we can find how well our model is doing using accuracy on test dataset.
In above code, we are using trainer.predict method to predict on our test dataset.
Then we find the accuracy score using same function we defined at the time of train. The output will be
As you can see we are getting 97% accuracy which is really good.
Complete code for the post is in below google colab notebook.
You can also access python notebook on github.
In this post, we saw how to fine-tune a pre-trained model using hugging face API. These two posts give you end to end flow of fine-tuning a transformer model.