Phishing refers to a family of online frauds where an Internet user is lured into submitting his/her sensitive data for malicious purposes.
Your goal is to construct a decision tree model that accurately decides whether a web site is a phishing site or not.
Load the data set.
Material | Link | Reference |
Data set | csv | Mohammad, Rami, Thabtah, Fadi Abdeljaber and McCluskey, T.L. (2015) Phishing Websites Dataset. Downloadable via Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. |
Documentation |
Study the contents of the data set.
Note: As the interpretation of the -1’s and 1’s in the Result column seems to be missing from the document, it may be helpful to know that a 1 corresponds to a phishing site and a -1 to a legitimate site.
Construct a decision tree that classifies the websites into phishing sites and legitimate sites.
Get an estimate of the classifier's performance by cross-validating.
Play with parameters controlling the tree size. Try to make a tree that is:
Interpret the resulting tree. How would you instruct an internet analyst to detect a phishing website, based on your decision tree?
Back to main page