Phishing Sites Detection from a Web Developer’s Perspective Using Machine Learning

Zhou, Xin; Verma, Rakesh

Phishing Sites Detection from a Web Developer’s Perspective Using Machine Learning

Files

0641.pdf (689.61 KB)

Date

2020-01-07

Authors

Zhou, Xin

Verma, Rakesh

Abstract

The Internet has enabled unprecedented communication and new technologies. Concomitantly, it has brought the bane of phishing and exacerbated vulnerabilities. In this paper, we propose a model to detect phishing webpages from a web developer’s perspective. From this standpoint, we design 120 novel features based on content from a webpage, four time-based and two search-based novel features, plus we use 34 other content-based and 11 heuristic features to optimize the model. Moreover, we select Random Committee (Base learner: Random Tree) for our framework since it has the best performance after comparing with six other algorithms: Hellinger Distance Decision Tree, SVM, Logistic Regression, J48, Naive Bayes, and Random Forest. In real-time experiments, the model achieved 99.4% precision and 98.3% MCC with 0.1% false positive rate in 5-fold crossvalidation using the realistic scenario of an unbalanced dataset.

Keywords

Machine Learning and Cyber Threat Intelligence and Analytics, machine learning, phishing website, random committee

URI

http://hdl.handle.net/10125/64536

Extent

10 pages

Related To

Proceedings of the 53rd Hawaii International Conference on System Sciences

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International

Collections

Machine Learning and Cyber Threat Intelligence and Analytics

Full item page

Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.

Phishing Sites Detection from a Web Developer’s Perspective Using Machine Learning

Files

Date

Authors

Contributor

Advisor

Department

Instructor

Depositor

Speaker

Researcher

Consultant

Interviewer

Narrator

Transcriber

Annotator

Journal Title

Journal ISSN

Volume Title

Publisher

Volume

Number/Issue

Starting Page

Ending Page

Alternative Title

Abstract

Description

Keywords

Citation

URI

Extent

Format

Geographic Location

Time Period

Related To

Related To (URI)

Table of Contents

Rights

Rights Holder

Local Contexts

Collections