STAT3175-STAT6175 Linear Models
Assignment 3, 2022
Due date: Friday 27 May, 5 pm
Instructions
-
You should submit the assignment using the Turnitin tool on iLearn.
-
For all hypothesis tests you should state the null and alternative hypotheses, test statistic, distribution of the test statistic under H0, p-value and conclusion.
Question 1 *** Histograms amended 25/5
The Andoan Free Colonies are a former colony of the planet Ando. The population consists of two major subpopulations: the indigenous people and the colonising Andoans. As is common in many former colonies, the indigenous population has lower life expectancy and generally worse health and socio-economic outcomes than the colonising population.
A health survey was conducted on a random sample of the population. One of the research questions was whether the prevalence1 of measles, a common childhood disease, was di erent in the two ethnic groups. Speci cally, it was thought that measles was more prevalent in the indigenous population.
The following was the age distribution of the sample:

Percent of Total
15
10
5
0
Indigenous
|
Andoan |
||||
|
0 |
20 |
40 |
60 |
80 |
|
age |
15
10
5
0
The median age for the indigenous people in the sample was 39 years; for the Andoans 46 years; and for the overall sample 42 years. Simple analysis revealed the following:
-
“Prevalence is the proportion of a population who have (or had) a speci c characteristic in a given time period { in medicine, typically an illness, a condition, or a risk factor such as depression or smoking.”
1
-
On the basis of this analysis, what would you conclude about the prevalence of measles in the indigenous population, compared with the Andoan population? Use an appropriate statistical test.
-
Explain carefully why this simple analysis is awed. You may use a diagram to aid in your explanation. Give some examples of the statements that could be made following a more correct analysis.
2
Thompson et al. (2013) used the analysis of video game telemetry data from real-time strategy (RTS) games to explore the development of expertise in humans. Thompson et al. (2013): RTS games, in which players develop game pieces called units with the ultimate goal to destroy their opponents headquarters, have three relevant di erences from traditional strategy games such as chess. First, the games have an economic component such that players must spend resources to produce military units. Many of a players strategic decisions are related to balancing spending on military and economic strength. Second, the game board, called a map, is much larger than what that player can see at any one time. The resulting uncertainty about the game state leads to a variety of information gathering strategies, and requires vigilance and highly developed attentional processes. Third, in RTS games players do not have to wait for their opponent to play their turn. Players that can execute strategic goals more e ciently have an enormous advantage. Consequently, motor skills with a keyboard and mouse are an integral component of the game. Each game produces lots of behavioral data: an average game of chess consists of 40 moves per player, while the average RTS game in our study consists of 1635 moves per player. The authors collected telemetric data from 3,360 RTS game players from 7 levels of expertise, ranging from novices to full-time professionals. They posted a call for StarCraft 2 players through online gaming communities and social media. From each respondent they gathered a replay le (a recording of all the commands issued in the game), demographic information, and a player identi cation code that allowed them to verify the players level of expertise. This information was distilled into over 20 cognitive-motor, attentional, and perceptual processing variables to study expertise. The data set can be found in SkillCraft assn3.sav. In this assignment we will only consider ve variables:
-
Variable
Description
Age
Age of player in years (integer).
TotalHours
Reported total number of hours spent playing StarCraft2 (integer)
WorkersMade
The number of workers (SCVs, drones, and probes) trained per minute (con-
tinuous)
League Index
Level of the game coded in increasing order of di culty where 1 =Bronze,
2=Silver, 3=Gold, 4=Platinum, 5=Diamond, 6=Master, 7=Grand Master, 8=
Professional leagues
APM
Actions per minute.
-
Examine the frequencies of each level of LeagueIndex, and if necessary perform a suitable recoding of the levels for inclusion as a covariate. Give the frequencies of your recoded variable.
-
Examine the distribution of APM, and if necessary obtain a suitable transformation.
-
Consider the inclusion of LeagueIndex and APM into the model for log(WorkersMade). You should consider inclusion of the above covariates included and interaction terms. (It is not necessary to perform diagnostic checking.) Write down your nal model.
-
Write down the tted model equations for (i) League Index 2 (Silver), and (ii) League Index 5 (Diamond). Interpret the model coe cients. How does Diamond compare with Silver?
3
Continuing on from Question 2, players have been classi ed as follows:
-
WorkersMade
Skill
< 0:0015
0
(standard)
0:0015
1
(star player)
-
Compute the variable Skill and give its frequency table.
-
Model carefully the probability of a player being a star player, as a function of the covariates in Question 2 above. You should
-
-
perform an initial exploratory analysis. Do not repeat analyses already performed in Question 2 above;
-
-
-
using a model selection criterion, develop an appropriate model; give the table of parameter estimates for your nal model; and
-
-
-
choose one of the covariates in your nal model and provide an explanation of its parameter estimates. Do these agree with your model developd in Question 2?
-
-
-
Using your nal model, compute the tted probability of a player with the following pro le being a star player:
-
-
Age
25 years
TotalHours
2000
APM
100
LeagueIndex
5 (Diamond)
Show your working.
-
Give the classi cation table, using a cut-o of the prior probability of being a star player. What is the sensitivity and speci city?
-
Construct the ROC curve and give the area under the curve. What does the area under the curve indicate in this case?
4