DATA4200 Subject Name Data Acquisition and Management

I just want part B & C only.
Assessment 1 Information
Subject Code: DATA4200
Subject Name: Data Acquisition and Management
Assessment Title: Sampling and data mining project
Assessment Type: Report
Word Count: 800 Words (+/-10%)
Weighting: 35 % (15% group work, 20% individual work)
Total Marks: 35
Submission: -(Group software files and individual drafts) submitted via MyKBS in class and email.
-(Final individual report) due 72 hours after class via Turnitin
Due Date: Week 6
Your Task
Complete parts A and B below in class in week 6 and Part C must be complete 72 hours after your class
finishes. Consider the rubric at the end of the assignment for guidance on structure and content.
Submit results as below:
• In class: Group sampling work is to be submitted as a software file (e.g. excel, Power BI) via MyKBS at 1.5
hours into class time. You do not have to submit the python script that you will be given in addition to the
4/30/2021 64697 – I just want part B & C only.Assessment 1 InformationSubject 3/7
other resources.
• In class: Individual draft of data mining report is to be emailed to your lecturer at the end of class.
• Via Turnitin 72 hours after your class finishes: Final individual report of data mining
Assessment Description
• Business Problem: Suppose that you are a data analyst for the International Federation of Association
Football (FIFA). You want to report on the statistics of world football players, given some of the many
variables that are collected on them. You don’t, however, want to use the entire data set, so you decide to
clean the data, take a sample and report on that.
• Data sets: your teacher will provide you with data on the day.
• Learning outcomes: LO2, LO3, LO4
Assessment Instructions
Part A: Group component (1.5 hours, 15 Marks) In Microsoft Excel or Power BI.
1. Open the data file. Perform some basic data cleansing (removing missing or incorrect values, eg. Age = 0,
missing country of origin, and other errors), (3 marks)
2. Recall the sampling methods below that you have learnt about in lectures.
You will be given a script to enable you to run quota, systematic, simple random sampling and stratified
sampling method on the data.
Apply one of the following sampling methods (quota, systematic, or simple random) to obtain a subgroup of
5000 rows.
3. In a paragraph of approx. 200 words,
a) explain this type of sample relates to the larger dataset, (2 marks)
b) provide a simple summary (statistical or visual) of three variables using the sampled data set. (3 marks)
4. Apply the stratified sampling method to the data using the script provided by your teacher. In a paragraph
of approx. 200 words,
a) Provide a simple summary (statistical or visual) of the same three variables that you chose in 3b), given
this new sample. (3 marks)
b) Interpret and compare the results of part 3 and 4. Also explain the limitations of the method chosen. (4
********************Submit your software file with the visualisation*********************
Individual Components (20 Marks)
Part B: Individual draft (1.5 hours (in class), 10 Marks) Use one of the sampled datasets from Part A for this
a) Select six diverse variables from the data. These variables must differ from those that the group chose in
part A.
b) Start to create visualisations and summary statistics.
c) Submit a draft of your work in at the end of class, as you will continue this exercise in part C below.
Part C: Individual final report (72 hours outside of class, approximately 400 words, 10 Marks)
a) In a paragraph of approximately 300 words, interpret your results and visualisations. [6 marks]
b) List an advantage and possible disadvantage of the sampling method that you chose for this exercise. [2
c) Explain the difference between non-probability and probability sampling. [2 marks]
d) Submit via Turnitin within 72 hours of the class
Important Study Information
Academic Integrity Policy
KBS values academic integrity. All students must understand the meaning and consequences of cheating,
plagiarism and other academic offences under the Academic Integrity and Conduct Policy.
What is academic integrity and misconduct?
What are the penalties for academic misconduct?
What are the late penalties?
How can I appeal my grade?
Click here for answers to these questions:
Word Limits for Written Assessments
Submissions that exceed the word limit by more than 10% will cease to be marked from the point at which
4/30/2021 64697 – I just want part B & C only.Assessment 1 InformationSubject 4/7
that limit is exceeded.
Study Assistance
Students may seek study assistance from their local Academic Learning Advisor or refer to the resources on
the MyKBS Academic Success Centre page. Click here for this information.
Assessment Marking Guide
Section Criteria NN (Fail)
0%-49% P (Pass)
50%-64% CR (Credit) 74%-65% DN (Distinction) 75%-84% HD (High
Part A:
(15 Marks) Sampling methods (group
Data cleaning (3
Students will produce a stratified, and one of quota, systematic or simple random sampling should be
applied sample using a supplied python script
Students explain how their sample relates larger dataset (2
Students will provide a summary for the same
three variables after each sampling method has been applied and compare results.(10 marks)
Incorrect sampling
Parts missing
Little or no explanation of results or method Basic requirements met
Summary brief and general
May be a poor comparison All parts present relevant
explanation and well summarized
Good comparison
May lack some detail All parts present and detail provided on
methods and summary
Solid and relevant
Well thought out summary and comparisons All parts present and well integrated group
Deep detail provided on methods and
Novel and engaging summary
Part B:
(10 Marks) (Individual component in class) Students must select five diverse variables (which differ from
group ones) and start to visualize and summarise the data.
A draft report must be emailed at the end of class Nothing started in class
Variables not different or diverse
Parts missing
Basic requirements met with five new variables and
visualisations started
May not be that
Five new variables and
visualisations started
Diversity evident
Five new variables and
visualisations started
Diversity evident
Start of good
4/30/2021 64697 – I just want part B & C only.Assessment 1 InformationSubject 5/7
Five very diverse variables chosen and
visualisations started
Diversity evident
Start of well integrated interpretation
Page 4 Kaplan Business School Assessment Outline
Part C:
(10 Mark
(Individual component at home) s) Good r divers eport on five new
e variables and
detailed relevant explanations of theory
All consistent with class
Evidence of extra work done to improve class work, make the report flow well and complete Engaging report
on five new diverse variables and detailed, well integrated, relevant explanations
of theory done
Excellent flow of report
All consistent with class work, complete and evidence of extra work to provide a novel approach
A well polished report, consistent with the class work, must be
done (6 marks)
Answers to theoretical sections should be evident (2 + 2 = 4 marks) Parts missing
Final report different from draft and too general
No theory discussed
No effort at home Basic report on five new variables and brief explanations of theory done
All consistent with class work
May be small improvement to class work Good report on five new variables and full and relevant
explanations of theory done
All consistent with class work and clearly tried to improve and complete most of the class work done
Page 5 Kaplan Business School Assessment Outline
Assignment Submission
Students must submit their final report 72 hours after their class ends, via Turnitin in week 6.
Late assignment submission penalties
Penalties will be imposed on late assignment submissions in accordance with Kaplan Business School’s
Assessment Policy.
Number of days Penalty
1* – 9 days 5% per day for each calendar day late deducted from the student’s total Marks.
10 – 14 days 50% deducted from the student’s total marks.
After 14 days Assignments that are submitted more than 14 calendar days after the due date will not be
accepted and the student will receive a mark of zero for the assignment(s).
Note Notwithstanding the above penalty rules, assignments will also be given a mark of zero if they are
submitted after assignments have been returned to students.
*Assignments submitted at any stage within the first 24 hours after deadline will be considered to be one day
late and therefore subject to the associated penalty.
If you are unable to complete this assessment by the due date/time, please refer to the Special Consideration
Application Form, which is available at the end of the KBS Assessment Policy:
Page 6 Kaplan Business School Assessment Outline