中文版 | English
题名

Phishing websites detection via CNN and multi-head self-attention on imbalanced datasets

作者
通讯作者Hu,Guangwu
发表日期
2021-09-01
DOI
发表期刊
ISSN
0167-4048
EISSN
1872-6208
卷号108
摘要
Phishing websites belong to a social engineering attack where perpetrators fake legitimate websites to lure people to access so as to illegally acquire user's identity, password, privacy and even properties. This attack imposes a great threat to people and becomes more and more severe. In order to identify phishing websites, many proposals have shown their merits. For example, the classical proposal CNN-LSTM received a very high precision by combining Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) together. However, despite CNN achieved great success in AI area, LSTM still exists the biases issue since it always treats the later features much more important than the former ones. In the meanwhile, as the self-attention mechanism can discover the text's inner dependency relationships, it has been widely applied to various tasks of deep learning-based Natural Language Processing (NLP). If we treat a URL as a text string, this mechanism can learn comprehensive URL representations. In order to improve the accuracy for phishing websites detection further, in this paper, we propose a novel Convolutional Neural Network (CNN) with self-attention named self-attention CNN for phishing Uniform Resource Locators (URLs) identification. Specifically, self-attention CNN first leverages Generative Adversarial Network (GAN) to generate phishing URLs so as to balance the datasets of legitimate and phishing URLs. Then it utilizes CNN and multi-head self-attention to construct our new classifier which is comprised of four blocks, namely the input block, the attention block, the feature block and the output block. Finally, the trained classifier can give a high-accuracy result for an unknown website URL. Overall thorough experiments indicate that self-attention CNN achieves 95.6% accuracy, which outperforms CNN-LSTM, single CNN and single LSTM by 1.4%, 4.6% and 2.1% respectively.
关键词
相关链接[Scopus记录]
收录类别
SCI ; EI
语种
英语
学校署名
其他
资助项目
National Key Research and Development Program of China["2018YFB1800204","2018YFB1800601"] ; National Natural Science Foundation of China[61972219,61771273] ; Natural Science Foundation of Guangdong Province[2021A1515012640] ; R&D Program of Shenzhen["JCYJ20190813174403598","SGDX20190918101201696","JCYJ20190813165003837"]
WOS研究方向
Computer Science
WOS类目
Computer Science, Information Systems
WOS记录号
WOS:000677639500010
出版者
EI入藏号
20212710578501
EI主题词
Computer crime ; Convolution ; Long short-term memory ; Natural language processing systems
EI分类号
Information Theory and Signal Processing:716.1 ; Data Processing and Image Processing:723.2
ESI学科分类
COMPUTER SCIENCE
Scopus记录号
2-s2.0-85108874331
来源库
Scopus
引用统计
被引频次[WOS]:29
成果类型期刊论文
条目标识符http://kc.sustech.edu.cn/handle/2SGJ60CL/230141
专题南方科技大学
工学院_计算机科学与工程系
作者单位
1.Shenzhen International Graduate School,Tsinghua University,China
2.Peng Cheng Laboratory,Shenzhen,China
3.School of Computer Science,Shenzhen Institute of Information Technology,Shenzhen,China
4.Southern University of Science and Technology,Shenzhen,China
推荐引用方式
GB/T 7714
Xiao,Xi,Xiao,Wentao,Zhang,Dianyan,et al. Phishing websites detection via CNN and multi-head self-attention on imbalanced datasets[J]. COMPUTERS & SECURITY,2021,108.
APA
Xiao,Xi.,Xiao,Wentao.,Zhang,Dianyan.,Zhang,Bin.,Hu,Guangwu.,...&Xia,Shutao.(2021).Phishing websites detection via CNN and multi-head self-attention on imbalanced datasets.COMPUTERS & SECURITY,108.
MLA
Xiao,Xi,et al."Phishing websites detection via CNN and multi-head self-attention on imbalanced datasets".COMPUTERS & SECURITY 108(2021).
条目包含的文件
条目无相关文件。
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[Xiao,Xi]的文章
[Xiao,Wentao]的文章
[Zhang,Dianyan]的文章
百度学术
百度学术中相似的文章
[Xiao,Xi]的文章
[Xiao,Wentao]的文章
[Zhang,Dianyan]的文章
必应学术
必应学术中相似的文章
[Xiao,Xi]的文章
[Xiao,Wentao]的文章
[Zhang,Dianyan]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。