Quantcast

Factor Analysis and Multiple Imputation

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Factor Analysis and Multiple Imputation

gregor.hochschild
Hi,

I would like to run a couple of regressions using the factor score from an explorative factor analysis  as the dependent variable but I am not sure how I should handle missing data. In particular, I want to
a) construct the dependent variable from 8 items using explorative factor analysis
b) run some regressions using the factor score as the dep. variable

There are missing values for pretty much all the variables including the 8 items as well as the independent variables in the regression. What is the best approach to handle the missing data problem? What is the right imputation procedure in this case?
Should I first use all available information in the data to recover the missing data across all the variables, and then run the factor analysis? But how do I do this in Stata given that mi does not support factor analysis?

Thanks,
Greg
--
GMX DSL: Internet-, Telefon- und Handy-Flat ab 19,99 EUR/mtl.  
Bis zu 150 EUR Startguthaben inklusive! http://portal.gmx.net/de/go/dsl
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Factor Analysis and Multiple Imputation

Maarten buis
--- On Thu, 22/7/10, [hidden email] wrote:

> I would like to run a couple of regressions using the
> factor score from an explorative factor analysis  as
> the dependent variable but I am not sure how I should handle
> missing data. In particular, I want to
> a) construct the dependent variable from 8 items using
> explorative factor analysis
> b) run some regressions using the factor score as the dep.
> variable
>
> There are missing values for pretty much all the variables
> including the 8 items as well as the independent variables
> in the regression. What is the best approach to handle the
> missing data problem? What is the right imputation procedure
> in this case?
> Should I first use all available information in the data to
> recover the missing data across all the variables, and then
> run the factor analysis? But how do I do this in Stata given
> that mi does not support factor analysis?

The aim of an imputation model is to reproduce the observed patterns in
the data on to the missing values. You need to make sure that you
reproduce the relevant patterns for your model of interest, but that
does not mean that you need to use the same model as you intend to use
in your final analysis. The factor score is just a linear combination
of your observed items, so it is enough for the regression part of your
model, to reproduce the association between the observed items and your
explanatory variables. Factor analysis just uses the correlation between
the observed items, so as long as your imputation model reproduces the
correlations between the items you are ok for the factor analysis part.

So taking the two together: As long as your imputation model reproduces
the patterns between all the directly observed variables (items and
explanatory variables) you are ok, and your imputation model does not
need to include the factor scores. You can use either the official Stata
-mi- commands for that or -ice- (see:  -findit ice- for several articles
on that, and download the software from SSC, type in Stata
-ssc install ice-)

Hope this helps,
Maarten

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------


     

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Factor Analysis and Multiple Imputation

gregor.hochschild

-------- Original-Nachricht --------
> Datum: Thu, 22 Jul 2010 23:38:21 -0700 (PDT)
> Von: Maarten buis <[hidden email]>
> An: [hidden email]
> Betreff: Re: st: Factor Analysis and Multiple Imputation

> --- On Thu, 22/7/10, [hidden email] wrote:
> > I would like to run a couple of regressions using the
> > factor score from an explorative factor analysis  as
> > the dependent variable but I am not sure how I should handle
> > missing data. In particular, I want to
> > a) construct the dependent variable from 8 items using
> > explorative factor analysis
> > b) run some regressions using the factor score as the dep.
> > variable
> >
> > There are missing values for pretty much all the variables
> > including the 8 items as well as the independent variables
> > in the regression. What is the best approach to handle the
> > missing data problem? What is the right imputation procedure
> > in this case?
> > Should I first use all available information in the data to
> > recover the missing data across all the variables, and then
> > run the factor analysis? But how do I do this in Stata given
> > that mi does not support factor analysis?
>
> The aim of an imputation model is to reproduce the observed patterns in
> the data on to the missing values. You need to make sure that you
> reproduce the relevant patterns for your model of interest, but that
> does not mean that you need to use the same model as you intend to use
> in your final analysis. The factor score is just a linear combination
> of your observed items, so it is enough for the regression part of your
> model, to reproduce the association between the observed items and your
> explanatory variables. Factor analysis just uses the correlation between
> the observed items, so as long as your imputation model reproduces the
> correlations between the items you are ok for the factor analysis part.
>
> So taking the two together: As long as your imputation model reproduces
> the patterns between all the directly observed variables (items and
> explanatory variables) you are ok, and your imputation model does not
> need to include the factor scores. You can use either the official Stata
> -mi- commands for that or -ice- (see:  -findit ice- for several articles
> on that, and download the software from SSC, type in Stata
> -ssc install ice-)
>
> Hope this helps,
> Maarten
>
> --------------------------
> Maarten L. Buis
> Institut fuer Soziologie
> Universitaet Tuebingen
> Wilhelmstrasse 36
> 72074 Tuebingen
> Germany
>
> http://www.maartenbuis.nl
> --------------------------

Thanks for your quick reply Maarten!!
I guess the step I am not sure about is the pooling for the factor analysis. Is it correct that I do not do the pooling for the factor analysis? First, I do the imputation step. Second, I run the factor analysis separately for each imputation and create the factor scores (as a result each imputed datasets m has the factor score variable and the values for certain observations differ across the datasets). Third, I run my data analysis on each imputation and the results are pooled to obtain a single multiple-imputation results.

Is that correct?

Thanks again!

greg


--
Neu: GMX De-Mail - Einfach wie E-Mail, sicher wie ein Brief!  
Jetzt De-Mail-Adresse reservieren: http://portal.gmx.net/de/go/demail
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Factor Analysis and Multiple Imputation

Stas Kolenikov
In reply to this post by gregor.hochschild
On Thu, Jul 22, 2010 at 4:35 PM,  <[hidden email]> wrote:
> I would like to run a couple of regressions using the factor score from an explorative factor analysis  as the dependent variable but I am not sure how I should handle missing data. In particular, I want to
> a) construct the dependent variable from 8 items using explorative factor analysis
> b) run some regressions using the factor score as the dep. variable
>
> There are missing values for pretty much all the variables including the 8 items as well as the independent variables in the regression. What is the best approach to handle the missing data problem? What is the right imputation procedure in this case?
> Should I first use all available information in the data to recover the missing data across all the variables, and then run the factor analysis? But how do I do this in Stata given that mi does not support factor analysis?

Whatever goes after -mi:- or -ice:- prefix must be a single command.
You would need to write a small wrapper to combine both factor
analysis and regression commands.

I personally would have little trust in your analysis, missing data or
not, as I have little trust in EFA to begin with. For one thing, the
two-step procedure you describe will likely underestimate the standard
errors, since the regression step is not aware of the fact that your
dependent variable is generated with an extra measurement error.
Multiple imputation will not overcome this, as it still uses the wrong
standard errors from the regression step. For another thing, there are
just too many options of getting the scores, and justifications for
them are ad hoc at best. If I were in your shoes, I would wrap things
up into a MIMIC model and estimate it with -gllamm-.

--
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Factor Analysis and Multiple Imputation

mihir
This post has NOT been accepted by the mailing list yet.
Thanks guys. This old post is still useful. Right now I am in a similar problem. I would appreciate if you can guide me.

I have generated multiple data (5 data sets) using multiple imputation using ICE. Now I want to create a bartlett score using principal component from available imputed variables to use it as a predictor in several regression analysis.

I don't know how to create bartlett score from 5 imputed data sets. Should I calculate the bartlett score separately from each of the 5 imputed data sets OR should I just combined 5 data sets and then calculate the bartlett score?

Later, I use mi estimate command to run regression analysis based on 5 imputed data sets (with additional variable - bartlett score).

Thanks in advance.

Mihir
Loading...