Assessment 2 – Business Analytics Research Project

page 1
Assessment 2 – Business Analytics Research Project (Due on 15th November 2020,
11.59pm)
Part 1: Data Mining, Machine Learning and Text Analytics (50%)
Exercise 1: Data Mining and Machine Learning (30%)
You work with the PVA_PARTITION data set in this Exercise. It contains data that represent
charitable donations made to a veterans’ organization. The data represent the results of a mail
campaign to solicit donations. Solicitations involve sending a small gift to an individual and
include a request for a donation. The data set contains the following information:
• a flag to indicate respondents to the appeal (Target Gift Flag) and the dollar amount of their
donations (Target Gift Amount)
• respondents’ PVA promotion and giving history
• demographic data of the respondents
1. Using SAS Visual Analytics
a. Sign in to SAS Visual Analytics.
b. Select Explore and Visualize Data to begin accessing and exploring the data.
c. Select the PVA_PARTITION data source.
d. Select the Data pane on the left of the canvas (if it is not open).
1) Which level of the Status Category 96NK variable has the highest count?
_____________
2) Does the variable Age contain any missing values? If so how many?
____________________________
3) What is the average of Target Gift Amount?
_________________________________
e. Change Target Gift Flag from a measure to a category. It is a binary indicator that
represents a response to a mailing, where 1 indicates that customers did respond.
1) How are responders and non-responders distributed in the
data?__________________
2) How many females responded to the campaign?________________
f. Save the report. Click (Menu) and select Save As. Save the report in My Folder 
Analytics Toolbox with the name Exercise 1. Click Save.
Continue to work with the PVA_PARTITION data set to train a neural network model. The model
aims to classify those customers who made a donation.
2. Training a Neural Network Model in SAS Visual Data Mining and Machine Learning
a. Open your saved report, Exercise 1, which was created above in 1.
b. Select the Data pane on the left of the canvas and open the PVA_PARTITION data
source.
If you have not done so already, in the Measure column, right-click Target Gift Flag and
select Category.
c. Create a new page.
d. Add a neural network to the canvas.
page 2
e. Disable auto-refresh on the menu bar (if not done already).
f. Add Target Gift Flag as the response.
g. Under Predictors, click Add. In the Add Data Items window, select all predictor variables
except for these five:
• Control Number
• Demographic Cluster
• Partition
• Target Gift Amount
• Target Gift Amount with Zero
(In all, you add 24 predictors.)
h. Create the neural network model by clicking Refresh or enabling auto-refresh.
• How many observations are used by algorithm?
• Why all observations are not used by algorithm?
• What is the misclassification rate for the model created with default settings?
i. Select the Options pane on the right and change Optimization Method to SGD. Do you
see any improvement in the misclassification rate?
j. Perform honest assessment and examine the results.
1) Select the Data pane on the left of the canvas and set the Partition variable as a
new partition.
2) Select the Roles pane on the right of the canvas and assign the Partition variable
under the Partition ID role. Refresh the model and note the validation
misclassification rate.
3) Select the Options pane and change the L2 regularization parameter value to 0.001.
Under Hidden Layers, change Number of Hidden Layers property to 2. Do these
changes result in any improvement in the validation misclassification rate statistics?
4) Examine the validation cumulative lift chart. What can you determine about the top
10% (percentile) of the data? How does this model compare to the Best model?
3. Performing Autotuning to Determine Optimal Model Parameters
a. Duplicate the neural network object page.
Note: This creates a copy of a previously trained neural network model.
b. In the Options pane, select the Autotune property and use the default autotune
hyperparameters. Update the model (processing might take several minutes).
c. Examine the optimal values selected for
• Number of hidden layers?
• Number of neurons?
• L1 and L2?
d. Did you notice any improvement in the validation misclassification rate compared to the
previous model?
e. Use a validation cumulative lift chart to compare this model with a previous model at the
top 10% (percentile) of the data. What have you discovered?
f. Save the report. Click (Menu) and select Save.
page 3
Exercise 2: Text Analytics (20%)
You work with the MOVIES_PLUS data set in this Exercise. It contains seven variables. The
text variable is overview. The variable Made_Money can be used as a category variable. Set
the variable title as a display variable.
Create a Text Analytics project as indicated in the following New Project window:
In this Exercise, you need to Identify at least three movies that are categorized as comedies.
You may find the file ComedyConcept.txt to be useful in this exercise.
page 4
Part 2: Report/Essay (30%) & Presentation (20%)
You are to write a short report/essay (maximum 2000 words). In addition, you will make a short
presentation (10 minutes) highlighting the essence of this report and present it to the lecturer on
Monday 16th November 2020.
“Write a report/essay on how you as a Business Analyst would use Business Analytics
knowledge and skills to evaluate the effectiveness and applicability of these latest innovative
business analytics techniques and tools in complex big data business problems for an
organisation.”
Write the above report/essay on one of the following techniques or tools:
• Structured Data Mining
• Unstructured Data Mining
• Machine Learning – Supervised
• Machine Learning – Unsupervised
• Pattern Matching
• Internet of Things
• Heuristics
• Neural Networks
• Support Vector Machines
• Forests
• Gradient Boosting
• Bayesian Networks
• Nearest Neighbour
• Decision Trees
• Fuzzy Networks
• Factorisation Machines
You can also propose a new innovative technique or tool not listed above.