Please use this identifier to cite or link to this item: http://hdl.handle.net/10125/80263

Universal Spam Detection using Transfer Learning of BERT Model

File Size Format  
0748.pdf 1.89 MB Adobe PDF View/Open

Item Summary

Title:Universal Spam Detection using Transfer Learning of BERT Model
Authors:Tida, Vijay Srinivas
Hsu, Sonya Hy
Keywords:Machine Learning and Cyber Threat Intelligence and Analytics
bert
ff1-score
finetuning
hyperparameters
show 5 moreperformance
pre-trained model
self-attention
spam classification
transformers
show less
Date Issued:04 Jan 2022
Abstract:Several machine learning and deep learning algorithms were limited to one dataset of spam emails/texts, which waste valuable resources due to individual models. This research applied efficient classification of ham or spam emails in real-time scenarios. Deep learning transformer models become important by training on text data based on self-attention mechanisms. This manuscript demonstrated a novel universal spam detection model using pre-trained Google's Bidirectional Encoder Representations from Transformers (BERT) base uncased models with multiple spam datasets. Different methods for Enron, Spamassain, Lingspam, and Spamtext message classification datasets, were used to train models individually. The combined model is finetuned with hyperparameters of each model. When each model using its corresponding datasets, an F1-score is at 0.9 in the model architecture. The "universal model" was trained with four datasets and leveraged hyperparameters from each model. An overall accuracy reached 97%, with an F1 score at 0.96 combined across all four datasets.
Pages/Duration:9 pages
URI:http://hdl.handle.net/10125/80263
ISBN:978-0-9981331-5-7
DOI:10.24251/HICSS.2022.921
Rights:Attribution-NonCommercial-NoDerivatives 4.0 International
https://creativecommons.org/licenses/by-nc-nd/4.0/
Appears in Collections: Machine Learning and Cyber Threat Intelligence and Analytics


Please email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.

This item is licensed under a Creative Commons License Creative Commons