Universal Spam Detection using Transfer Learning of BERT Model

dc.contributor.authorTida, Vijay Srinivas
dc.contributor.authorHsu, Sonya Hy
dc.date.accessioned2021-12-24T18:30:07Z
dc.date.available2021-12-24T18:30:07Z
dc.date.issued2022-01-04
dc.description.abstractSeveral machine learning and deep learning algorithms were limited to one dataset of spam emails/texts, which waste valuable resources due to individual models. This research applied efficient classification of ham or spam emails in real-time scenarios. Deep learning transformer models become important by training on text data based on self-attention mechanisms. This manuscript demonstrated a novel universal spam detection model using pre-trained Google's Bidirectional Encoder Representations from Transformers (BERT) base uncased models with multiple spam datasets. Different methods for Enron, Spamassain, Lingspam, and Spamtext message classification datasets, were used to train models individually. The combined model is finetuned with hyperparameters of each model. When each model using its corresponding datasets, an F1-score is at 0.9 in the model architecture. The "universal model" was trained with four datasets and leveraged hyperparameters from each model. An overall accuracy reached 97%, with an F1 score at 0.96 combined across all four datasets.
dc.format.extent9 pages
dc.identifier.doi10.24251/HICSS.2022.921
dc.identifier.isbn978-0-9981331-5-7
dc.identifier.urihttp://hdl.handle.net/10125/80263
dc.language.isoeng
dc.relation.ispartofProceedings of the 55th Hawaii International Conference on System Sciences
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subjectMachine Learning and Cyber Threat Intelligence and Analytics
dc.subjectbert
dc.subjectff1-score
dc.subjectfinetuning
dc.subjecthyperparameters
dc.subjectperformance
dc.subjectpre-trained model
dc.subjectself-attention
dc.subjectspam classification
dc.subjecttransformers
dc.titleUniversal Spam Detection using Transfer Learning of BERT Model
dc.type.dcmitext

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
0748.pdf
Size:
1.84 MB
Format:
Adobe Portable Document Format