I am spending this weekend working on the results section of my study.  I spent 2 days on the descriptive statistics (all 134 variables) and probably still need to add more discussion on some of them.  Those sections included t-tests and chi-square based on my key outcome variable to see which items matter.  I even limited the ones I would move into the logistic regression based upon effect sizes.

So I’m trying to run the logistic regressions today.  First I put each of the most promising set of variables in individually.  When you do a logistic regression in SPSS it’s base model is to look at the distribution of the outcome variable.  If more than 50% have the outcome variable, guess all have it.  If less, guess none have it.  For me this makes a blind guess around 70% correct.

The ONLY version of the model that adds anything substantive (around 10% better prediction) uses the two variables that you get latest in the process – like at the same time you get the outcome variable.  Specifically you can make a better guess as to whether someone will need placement into remedial math if you know whether they needed to be placed in remedial reading or writing.  Unfortunately that’s a bit too late to do anything about the situation.

All of the other potentially predictive variables, including high school GPA, math test scores at two different points, math self-efficacy, or demographic stuff, fails to add more than 0.1% to the predictiveness of the model.  Worse, some of these actually reduce it by up to 0.7%.

I don’t know what I’m doing wrong here….I mean, I didn’t expect a nobel prize for this thing, but I expected to find something.  People do it all the time.  I have to assume I’m doing something wrong…..I just don’t know what…