Fundamentals of Data Analytics



The purpose of this assignment is to implement the decision tree using
different split point evaluation measure and Naïve Bayes classification.
The performance metrics were evaluated for a machine learning model.
Question 1: (30points)
Given the following table construct a decision tree using a purity
threshold of 100%. Use information gain as the split point evaluation
measure. Present all the steps.
Next classify the points in terms of Risk:
a) (Age=27, Car= Vintage)- 15points
b) (Age=50, Car=Sports)- 15points
Point Age Car Risk
X1 25 Sports Low
X2 20 Vintage High
X3 25 Sports Low
X4 45 SUV High
X5 20 Sports High
X6 25 SUV High
Question 2: (30points)
Given the following table you need to predict the class label of a
tuple using Naïve Bayes Classification. Present all the probabilistic
Customer Internet Service Contract Payment Method Churn
C1 DSL Monthly Cash No
C2 DSL Monthly Credit No
C3 Broadband Monthly Cash Yes
C4 Fiber Optics Monthly Cash Yes
C5 Fiber Optics Yearly Cash Yes
C6 Fiber Optics Yearly Credit No
C7 Broadband Yearly Credit Yes
C8 DSL Monthly Cash No
C9 DSL Yearly Cash Yes
C10 Fiber Optics Yearly Cash Yes
C11 DSL Yearly Credit Yes
C12 Broadband Monthly Credit Yes
C13 Broadband Yearly Cash Yes
C14 DSL Monthly Credit No
C15 DSL Monthly Credit ?
Next you need to predict churn or not churn for the following
C15= (DSL, Monthly, Credit)- 30points
Question 3: (20 points)
Consider the following confusion matrix. The numbers present if an
employee will leave or not from his/her organization.
Predicted Leave
Actual Leave No Yes Total
No 2326 15 2341
Yes 50 634 684
Total 2376 649 3025
Calculate the following metrics:
a) Accuracy
b) Sensitivity
c) Specificity
d) Precision
e) Recall
After the calculation, you need to discuss the results and provide
explanations regarding the performance of the Machine Learning
model selected.
Question 4: (20points)
Define the meaning of the following terms:
-Information Gain -5pts
-Gini Index-5pts
-Confusion Matrix-5pts

