PhishCoder: Efficient Extraction of Contextual Information from Phishing Emails

Abstract

Phishing has emerged as one of the most widespread and expensive types of cyber threats, posing significant challenges to individuals and organizations worldwide. In response to the evolving threat, security workers are integrating AI and ML algorithms to mitigate phishing attacks automatically. Although email text is a crucial element in identifying phishing attacks, traditional text-embedding techniques face challenges due to variations in text length, structure, and the inability to capture context effectively. In this paper, we introduce PhishCoder, a novel framework designed to extract contextual information from phishing emails. Our focus is on human-centric features, which are often overlooked in traditional approaches, as they are the features that users notice when evaluating a potentially suspicious email. By fine-tuning four transformer-based models, we accurately extract seven descriptive features from phishing email texts. Our findings indicate that language models provide a promising method for extracting contextual information from phishing emails. This approach offers researchers and security workers a whole new set of features that would be valuable in combating phishing attacks and developing effective mitigation tools.

Publication
In Proceedings of the Workshop on Security and Artificial Intelligence (SECAI 2024)