Universal Spam Detection using Transfer Learning of BERT Model

Tida, Vijay Srinivas; Hsu, Sonya Hy

Universal Spam Detection using Transfer Learning of BERT Model

Files

0748.pdf (1.84 MB)

Date

2022-01-04

Authors

Tida, Vijay Srinivas

Hsu, Sonya Hy

Abstract

Several machine learning and deep learning algorithms were limited to one dataset of spam emails/texts, which waste valuable resources due to individual models. This research applied efficient classification of ham or spam emails in real-time scenarios. Deep learning transformer models become important by training on text data based on self-attention mechanisms. This manuscript demonstrated a novel universal spam detection model using pre-trained Google's Bidirectional Encoder Representations from Transformers (BERT) base uncased models with multiple spam datasets. Different methods for Enron, Spamassain, Lingspam, and Spamtext message classification datasets, were used to train models individually. The combined model is finetuned with hyperparameters of each model. When each model using its corresponding datasets, an F1-score is at 0.9 in the model architecture. The "universal model" was trained with four datasets and leveraged hyperparameters from each model. An overall accuracy reached 97%, with an F1 score at 0.96 combined across all four datasets.

Keywords

Machine Learning and Cyber Threat Intelligence and Analytics, bert, ff1-score, finetuning, hyperparameters, performance, pre-trained model, self-attention, spam classification, transformers

URI

http://hdl.handle.net/10125/80263

Extent

9 pages

Related To

Proceedings of the 55th Hawaii International Conference on System Sciences

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International

Collections

Machine Learning and Cyber Threat Intelligence and Analytics

Full item page

Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.

Universal Spam Detection using Transfer Learning of BERT Model

Files

Date

Authors

Contributor

Advisor

Department

Instructor

Depositor

Speaker

Researcher

Consultant

Interviewer

Narrator

Transcriber

Annotator

Journal Title

Journal ISSN

Volume Title

Publisher

Volume

Number/Issue

Starting Page

Ending Page

Alternative Title

Abstract

Description

Keywords

Citation

URI

Extent

Format

Geographic Location

Time Period

Related To

Related To (URI)

Table of Contents

Rights

Rights Holder

Local Contexts

Collections