Universal Spam Detection using Transfer Learning of BERT Model

Tida, Vijay Srinivas; Hsu, Sonya Hy

Universal Spam Detection using Transfer Learning of BERT Model

dc.contributor.author	Tida, Vijay Srinivas
dc.contributor.author	Hsu, Sonya Hy
dc.date.accessioned	2021-12-24T18:30:07Z
dc.date.available	2021-12-24T18:30:07Z
dc.date.issued	2022-01-04
dc.description.abstract	Several machine learning and deep learning algorithms were limited to one dataset of spam emails/texts, which waste valuable resources due to individual models. This research applied efficient classification of ham or spam emails in real-time scenarios. Deep learning transformer models become important by training on text data based on self-attention mechanisms. This manuscript demonstrated a novel universal spam detection model using pre-trained Google's Bidirectional Encoder Representations from Transformers (BERT) base uncased models with multiple spam datasets. Different methods for Enron, Spamassain, Lingspam, and Spamtext message classification datasets, were used to train models individually. The combined model is finetuned with hyperparameters of each model. When each model using its corresponding datasets, an F1-score is at 0.9 in the model architecture. The "universal model" was trained with four datasets and leveraged hyperparameters from each model. An overall accuracy reached 97%, with an F1 score at 0.96 combined across all four datasets.
dc.format.extent	9 pages
dc.identifier.doi	10.24251/HICSS.2022.921
dc.identifier.isbn	978-0-9981331-5-7
dc.identifier.uri	http://hdl.handle.net/10125/80263
dc.language.iso	eng
dc.relation.ispartof	Proceedings of the 55th Hawaii International Conference on System Sciences
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject	Machine Learning and Cyber Threat Intelligence and Analytics
dc.subject	bert
dc.subject	ff1-score
dc.subject	finetuning
dc.subject	hyperparameters
dc.subject	performance
dc.subject	pre-trained model
dc.subject	self-attention
dc.subject	spam classification
dc.subject	transformers
dc.title	Universal Spam Detection using Transfer Learning of BERT Model
dc.type.dcmi	text

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 0748.pdf
Size:: 1.84 MB
Format:: Adobe Portable Document Format

Download

Collections

Machine Learning and Cyber Threat Intelligence and Analytics