I am spending this weekend working on the results section of my study. I spent 2 days on the descriptive statistics (all 134 variables) and probably still need to add more discussion on some of them. Those sections included t-tests and chi-square based on my key outcome variable to see which items matter. I even limited the ones I would move into the logistic regression based upon effect sizes.
So I’m trying to run the logistic regressions today. First I put each of the most promising set of variables in individually. When you do a logistic regression in SPSS it’s base model is to look at the distribution of the outcome variable. If more than 50% have the outcome variable, guess all have it. If less, guess none have it. For me this makes a blind guess around 70% correct.
The ONLY version of the model that adds anything substantive (around 10% better prediction) uses the two variables that you get latest in the process – like at the same time you get the outcome variable. Specifically you can make a better guess as to whether someone will need placement into remedial math if you know whether they needed to be placed in remedial reading or writing. Unfortunately that’s a bit too late to do anything about the situation.
All of the other potentially predictive variables, including high school GPA, math test scores at two different points, math self-efficacy, or demographic stuff, fails to add more than 0.1% to the predictiveness of the model. Worse, some of these actually reduce it by up to 0.7%.
I don’t know what I’m doing wrong here….I mean, I didn’t expect a nobel prize for this thing, but I expected to find something. People do it all the time. I have to assume I’m doing something wrong…..I just don’t know what…