st: running a backward stepwise multivariate analysis on my specific dataset

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

st: running a backward stepwise multivariate analysis on my specific dataset

Scott Gilmore
Hi I'm new here and I could really use some help:

I have a dataset of N=50 patients and 20 variables.  13 are categorical and 7 are continuous variables.  I have no missing values.  My outcome is one of the variables and it is a binary outcome, 1= yes disease, 0= no disease.
 
I want to run an analysis for multivariate predictors of disease.  Using logistic regression and doing a backwards stepwise multivariate analysis.
 
I used the commands:
sw logistic var1 var2 var3 var4 var5, pr(0.5)
 
where var1=the outcome variable (binary)
var2= categorical (1,2,3,4,5)
var3= continuous
var4= categorical
and so forth.
 
I am not sure how to interpret the Odds Ratio and don't quite understand what I am getting.
 
Can someone please explain to me how to run a backward stepwise multivariate analysis on my specific dataset.
 
thank you!



     
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|

st: RE: running a backward stepwise multivariate analysis on my specific dataset

Nick Cox
It seems to me that you already found out how to run a stepwise
analysis. So what is the question? Finding out more about the odds ratio
seems best addressed by further study of logit/logistic modelling texts
and papers in your field.

On a point of detail: I think that these days logit models of the kind
discussed here wouldn't qualify as "multivariate". The meaning has
shifted from the classic "many variables" to the modern "many responses"
-- and you have one response.

See Frank Harrell's book "Regression modelling strategies" (Springer,
New York 2001) for a trenchant critique of the stepwise strategy.

Nick
[hidden email]

Scott Gilmore

I have a dataset of N=50 patients and 20 variables.  13 are categorical
and 7 are continuous variables.  I have no missing values.  My outcome
is one of the variables and it is a binary outcome, 1= yes disease, 0=
no disease.
 
I want to run an analysis for multivariate predictors of disease.  Using
logistic regression and doing a backwards stepwise multivariate
analysis.
 
I used the commands:
sw logistic var1 var2 var3 var4 var5, pr(0.5)
 
where var1=the outcome variable (binary)
var2= categorical (1,2,3,4,5)
var3= continuous
var4= categorical
and so forth.
 
I am not sure how to interpret the Odds Ratio and don't quite understand
what I am getting.
 
Can someone please explain to me how to run a backward stepwise
multivariate analysis on my specific dataset.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|

Re: st: running a backward stepwise multivariate analysis on my specific dataset

Ronan Conroy
In reply to this post by Scott Gilmore
On 3 Dec 2008, at 16:32, Scott Gilmore wrote:

> I have a dataset of N=50 patients and 20 variables.  13 are  
> categorical and 7 are continuous variables.  I have no missing  
> values.  My outcome is one of the variables and it is a binary  
> outcome, 1= yes disease, 0= no disease.

Looks like a nightmare to me - I have had someone in my office with  
almost exactly the same ratio of patients to variables. The trouble is  
that you don't have enough data. And a stepwise model will shrink  
badly when applied to new data, so the clinical validity of the  
exercise is very doubtful.

I might recommend -mrgraph- for inspecting the binary variables - with  
the -tab- option you get a nice 'northen blot'

Try also clustering routines to see if you can make any sense of the  
predictors.

But avoid statistical significance tests for the moment. Your chances  
of false negative results are very high given the sample size, and  
stepwise methods will only confuse the issue by capitalising on  
unreproducible features of your data.


Ronan Conroy
=================================

[hidden email]
Royal College of Surgeons in Ireland
Epidemiology Department,
Beaux Lane House, Dublin 2, Ireland
+353 (0)1 402 2431
+353 (0)87 799 97 95
+353 (0)1 402 2764 (Fax - remember them?)
http://rcsi.academia.edu/RonanConroy

P    Before printing, think about the environment




*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/