A Walden student surveyed three sections of the AMDS 8437 class last spring, collecting demographic data, scores on the first test, and final grade in the course. The student plans to build a multiple regression model based on these data. The data are provided in the accompanying spreadsheet.
a. What would the basic model be? You do not have to do the regression; just give the general equation that you would use. Make sure that you clearly identify the variables in your model.
b. What level of data is each of the variables?
c. Do these data meet the assumptions that underlie multiple regression?
d. Comment on the validity of such a model.
Solution: a. We are looking for a multiple linear regression of the form
and are the coefficients of the regression.
b. We have that
c. The assumptions for a multiple regression are basically that we have an interval dependent, based on linear combinations of interval, dichotomous, or dummy independent variables, the linearity of relationships and the same level of relationship throughout the range of the independent variables ("homoscedasticity").
The variables in our model satisfy the specified assumptions, but the linearity of relationships needs to be tested with a further analysis.
d. The idea of a multiple regression is to be able to determine the variables that actually act as predictors of the dependent variable. But we cannot be totally sure of the significance of any of the variables without applying some tests first (like the Wald’s test). Also we cannot disregard the possibility that certain interactions between the variables could affect the dependent variable.
The bottom line is that our model might look reasonable, but we cannot guarantee validity without running some tests first.