EMAIL PHISHING: TEXT CLASSIFICATION USING NATURAL LANGUAGE PROCESSING

Priyanka Verma, Anjali Goyal, Yogita Gigras

Abstract


Phishing is networked theft in which the main motive of phishers is to steal any person’s private information, its financial details like account number, credit card details, login information, payment mode information by creating and developing a fake page or a fake web site, which look completely authentic and genuine. Nowadays email phishing has become a big threat to all, and is increasing day by day. Moreover detection of phishing emails have been considered an important research issue as phishing emails have been increasing day by day. Various techniques have been introduced and applied to deal with such a big issue. The major objective of this research paper is giving a detailed description on the classification of phishing emails using the natural language processing concepts. NLP (natural language processing) concepts have been applied for the classification of emails, along with that accuracy rate of various classifiers have been calculated. The paper is presented in four sections. An introduction about phishing its types, its history, statistics, life cycle, motivation for phishers and working of email phishing have been discussed in the first section. The second section covers various technologies of phishing- email phishing and also description of evaluation metrics. An overview of the various proposed solutions and work done by researchers in this field in form of literature review has been presented in the third section. The solution approach and the obtained results have been defined in the fourth section giving a detailed description about NLP concepts and working procedure.


References


https://en.wikipedia.org/wiki/Phishing

https://www.proofpoint.com/us/corporate-blog/post/2019-state-phish-report-attack-rates-rise-account-compromise-soars

https://www.symantec.com/content/dam/symantec/docs/reports/istr-23-2018-en.pdf

https://searchsecurity.techtarget.com/definition/distributed-denial-of-service-attack

Almomani, A., Gupta, B. B., Atawneh, S., Meulenberg, A., & Almomani, E. (2013). A survey of phishing email filtering techniques. IEEE communications surveys & tutorials, 15(4), 2070-2090.

https://heimdalsecurity.com/blog/abcs-detecting-preventing-phishing/

Goel, D., & Jain, A. K. (2018). Mobile phishing attacks and defence mechanisms: State of art and open research challenges. Computers & Security, 73, 519-544.

https://www.incapsula.com/web-application-security/social-engineering-attack.html

Kang, A., Lee, J. D., Kang, W. M., Barolli, L., & Park, J. H. (2014). Security considerations for smart phone smishing attacks. In Advances in Computer Science and its Applications (pp. 467-473). Springer, Berlin, Heidelberg.

https://www.revealrisk.com/2019/02/20/deep-sea-phishing-a-taxonomy-for-email-threats/

Abdullah, A. A., George, L. E., & Mohammed, I. J. (2015). Research Article Email Phishing Detection System Using Neural Network. Research Journal of Information Technology, 6(3), 39-43.

Aggarwal, S., Kumar, V., & Sudarsan, S. D. (2014, September). Identification and detection of phishing emails using natural language processing techniques. In Proceedings of the 7th International Conference on Security of Information and Networks (p. 217). ACM.

https://isc.sans.edu/forums/diary/Using+RITA+for+Threat+Analysis/23926/

Yasin, A., & Abuhasan, A. (2016). An intelligent classification model for phishing email detection. arXiv preprint arXiv: 1608.02196.

Qbeitah, M. A., & Aldwairi, M. (2018, April). Dynamic malware analysis of phishing emails. In 2018 9th International Conference on Information and Communication Systems (ICICS) (pp. 18-24). IEEE.

Baykara, M., & Gürel, Z. Z. (2018, March). Detection of phishing attacks. In 2018 6th International Symposium on Digital Forensic and Security (ISDFS) (pp. 1-5). IEEE.

Sah, U. K., & Parmar, N. (2017). An approach for malicious spam detection in email with comparison of different classifiers.

Dr.Radha Damodaram (2016). Study on phishing attacks and antiphishing tools, IRJET.

https://help.returnpath.com/hc/en-us/articles/220220208-What-is-the-Spam-Uniform-Resource-Identifier-Real-time-Block-List-SURBL-

https://archive.ics.uci.edu/ml/datasets/sms+spam+collection

https://en.wikipedia.org/wiki/Scikit-learn

https://www.nltk.org/

Brownlee, J. (2016). K-Nearest Neighbors for Machine Learning. Machine Learning Mastery, 15.




DOI: https://doi.org/10.11591/APTIKOM.J.CSIT.87

Refbacks

  • There are currently no refbacks.


Copyright (c) 2020 APTIKOM Journal on Computer Science and Information Technologies

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

ISSN: 2528-2417, e-ISSN: 2528-2425

CSIT Stats

 

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.