Dear Statalist,
although it's not a particularly Stata specific question , I am hoping to get advise on the following (basic?) question: I am using the following command to get a correlation matrix quietly estpost correlate `vars', matrix esttab using correlations.csv, not unstack compress noobs star(* 0.10 ** 0.05 *** 0.01) long b(%9.2f) replace `vars' containts a battery of mostly metric variables. Besides the metric variables, there is also three dummy variables. I am wondering now if the reported (relatively high) correlation coefficients among the dummy variables and between some of the metric variables and the dummy variables are actually meaningful. How to interpret them / which correlation test to use? Thank's a lot, Christian * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
<> Try Nick`s http://www.stata.com/statalist/archive/2008-11/msg00933.html HTH Martin -----Ursprüngliche Nachricht----- Von: [hidden email] [mailto:[hidden email]] Im Auftrag von Christian Weiß Gesendet: Montag, 21. September 2009 16:45 An: statalist Betreff: st: Correlation of Dummy and Metric Variables? Dear Statalist, although it's not a particularly Stata specific question , I am hoping to get advise on the following (basic?) question: I am using the following command to get a correlation matrix quietly estpost correlate `vars', matrix esttab using correlations.csv, not unstack compress noobs star(* 0.10 ** 0.05 *** 0.01) long b(%9.2f) replace `vars' containts a battery of mostly metric variables. Besides the metric variables, there is also three dummy variables. I am wondering now if the reported (relatively high) correlation coefficients among the dummy variables and between some of the metric variables and the dummy variables are actually meaningful. How to interpret them / which correlation test to use? Thank's a lot, Christian * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
In reply to this post by Christian Weiss-2
Hi Martin,
thx a lot, I already had a look at this discussion and Nick's article. Unfortunately, I can't find a clear answer to my question. First, the discussion / article is about correlation a continous and a discrete variable, I am wondering though how to deal with a dummy (0 / 1) variable (perhaps the same way?). Second, in his mail Nick says - if I understood him right - that the correlation between a discrete and continous variable can make sense, but that it depends (on what?). Still, I find it surprising that there is no comment / option in the correlate commands of Stata, as I figure that this is a frequently occuring issue (as Nick mentions). Afterall, I might be usual (and easiest way?) to exclude dummy variables from a correlation matrix?` Best Christian On Mon, Sep 21, 2009 at 10:51 AM, Martin Weiss <[hidden email]> wrote: > > <> > > > > Try Nick`s http://www.stata.com/statalist/archive/2008-11/msg00933.html > > > > > HTH > Martin > > > -----Ursprüngliche Nachricht----- > Von: [hidden email] > [mailto:[hidden email]] Im Auftrag von Christian Weiß > Gesendet: Montag, 21. September 2009 16:45 > An: statalist > Betreff: st: Correlation of Dummy and Metric Variables? > > Dear Statalist, > > although it's not a particularly Stata specific question , I am hoping > to get advise on the following (basic?) question: > > > I am using the following command to get a correlation matrix > > quietly estpost correlate `vars', matrix > esttab using correlations.csv, not unstack compress noobs star(* > 0.10 > ** 0.05 *** 0.01) long b(%9.2f) replace > > `vars' containts a battery of mostly metric variables. Besides the > metric variables, there is also three dummy variables. > > I am wondering now if the reported (relatively high) correlation > coefficients among the dummy variables and between some of the metric > variables and the dummy variables are actually meaningful. How to > interpret them / which correlation test to use? > > > Thank's a lot, > Christian > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
<> ". First, the discussion / article is about correlation a continous and a discrete variable, I am wondering though how to deal with a dummy (0 / 1) variable (perhaps the same way?)." Is a dummy not a special case of a discrete variable? So the methods applicable to all discrete variables should be applicable to the dummy case as well... HTH Martin -----Ursprüngliche Nachricht----- Von: [hidden email] [mailto:[hidden email]] Im Auftrag von Christian Weiß Gesendet: Montag, 21. September 2009 17:26 An: [hidden email] Betreff: Re: st: AW: Correlation of Dummy and Metric Variables? Hi Martin, thx a lot, I already had a look at this discussion and Nick's article. Unfortunately, I can't find a clear answer to my question. First, the discussion / article is about correlation a continous and a discrete variable, I am wondering though how to deal with a dummy (0 / 1) variable (perhaps the same way?). Second, in his mail Nick says - if I understood him right - that the correlation between a discrete and continous variable can make sense, but that it depends (on what?). Still, I find it surprising that there is no comment / option in the correlate commands of Stata, as I figure that this is a frequently occuring issue (as Nick mentions). Afterall, I might be usual (and easiest way?) to exclude dummy variables from a correlation matrix?` Best Christian On Mon, Sep 21, 2009 at 10:51 AM, Martin Weiss <[hidden email]> wrote: > > <> > > > > Try Nick`s http://www.stata.com/statalist/archive/2008-11/msg00933.html > > > > > HTH > Martin > > > -----Ursprüngliche Nachricht----- > Von: [hidden email] > [mailto:[hidden email]] Im Auftrag von Christian > Gesendet: Montag, 21. September 2009 16:45 > An: statalist > Betreff: st: Correlation of Dummy and Metric Variables? > > Dear Statalist, > > although it's not a particularly Stata specific question , I am hoping > to get advise on the following (basic?) question: > > > I am using the following command to get a correlation matrix > > quietly estpost correlate `vars', matrix > esttab using correlations.csv, not unstack compress noobs star(* > 0.10 > ** 0.05 *** 0.01) long b(%9.2f) replace > > `vars' containts a battery of mostly metric variables. Besides the > metric variables, there is also three dummy variables. > > I am wondering now if the reported (relatively high) correlation > coefficients among the dummy variables and between some of the metric > variables and the dummy variables are actually meaningful. How to > interpret them / which correlation test to use? > > > Thank's a lot, > Christian > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
In reply to this post by Christian Weiss-2
The correlation of a dichotomous variable and a continuous (normal?) variable is closely related to the t-test. If you look at the p-value it is exactly the same as the p-value from a t-test for the continuous variable using the dichotomous variable as the 'by' variable. It's meaning depends on whether the continuous variable is close to normal
Tony Peter A. Lachenbruch Department of Public Health Oregon State University Corvallis, OR 97330 Phone: 541-737-3832 FAX: 541-737-4001 -----Original Message----- From: [hidden email] [mailto:[hidden email]] On Behalf Of Christian Weiß Sent: Monday, September 21, 2009 7:45 AM To: statalist Subject: st: Correlation of Dummy and Metric Variables? Dear Statalist, although it's not a particularly Stata specific question , I am hoping to get advise on the following (basic?) question: I am using the following command to get a correlation matrix quietly estpost correlate `vars', matrix esttab using correlations.csv, not unstack compress noobs star(* 0.10 ** 0.05 *** 0.01) long b(%9.2f) replace `vars' containts a battery of mostly metric variables. Besides the metric variables, there is also three dummy variables. I am wondering now if the reported (relatively high) correlation coefficients among the dummy variables and between some of the metric variables and the dummy variables are actually meaningful. How to interpret them / which correlation test to use? Thank's a lot, Christian * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
In reply to this post by Christian Weiss-2
In psychometrics, there are concepts of polychoric and polyserial
correlations. The first one is between two ordinal variables, and the second one is between an ordinal variable and a continuous variable. If your variables are truly nominal (like gender or geography), then the correlations are likely meaningless, although you can meaningfully ask whether the distributions of the continuous variables differ between the values of the discrete variable (answered by ANOVA, Kruskal-Wallis test and such). I wrote -polychoric- package some while ago that computes these correlations. On Mon, Sep 21, 2009 at 9:45 AM, Christian Weiß <[hidden email]> wrote: > Dear Statalist, > > although it's not a particularly Stata specific question , I am hoping > to get advise on the following (basic?) question: > > > I am using the following command to get a correlation matrix > > quietly estpost correlate `vars', matrix > esttab using correlations.csv, not unstack compress noobs star(* 0.10 > ** 0.05 *** 0.01) long b(%9.2f) replace > > `vars' containts a battery of mostly metric variables. Besides the > metric variables, there is also three dummy variables. > > I am wondering now if the reported (relatively high) correlation > coefficients among the dummy variables and between some of the metric > variables and the dummy variables are actually meaningful. How to > interpret them / which correlation test to use? > -- Stas Kolenikov, also found at http://stas.kolenikov.name Small print: I use this email account for mailing lists only. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
In reply to this post by Christian Weiss-2
Even the extensive Stata manuals do not take on the task of trying to explain all the statistical possibilities and pitfalls associated with every technique implemented. Why should they?
The literature -- in this case make that literatures -- is hopelessly divided on whether 1. Correlation makes full sense only for (approximately) continuous variables. 2. You need different ideas of correlation when at least one variable is not (approximately) continuous. Even with two dummies or indicators, there is a range of possibilities. It is better to think out what makes sense for your project than to be in fear that what you are doing is incorrect according to some authorities or experts. Nick [hidden email] Christian Weiß thx a lot, I already had a look at this discussion and Nick's article. Unfortunately, I can't find a clear answer to my question. First, the discussion / article is about correlation a continous and a discrete variable, I am wondering though how to deal with a dummy (0 / 1) variable (perhaps the same way?). Second, in his mail Nick says - if I understood him right - that the correlation between a discrete and continous variable can make sense, but that it depends (on what?). Still, I find it surprising that there is no comment / option in the correlate commands of Stata, as I figure that this is a frequently occuring issue (as Nick mentions). Afterall, I might be usual (and easiest way?) to exclude dummy variables from a correlation matrix?` * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
In reply to this post by Christian Weiss-2
The point-biserial correlation is a correlation between a dichotomous variable and a continuous one. http://en.wikipedia.org/wiki/Point-biserial_correlation_coefficient
Peter Lachenbruch noted that this ends up being the same math as the t-test. Unfortunately, skew in the dichotomous variable tends to reduce correlations. Thus methods such as the biserial correlation (special case of the polyserial that Stas mentioned) "fix up" the correlation at the cost of making some assumptions about what the dichotomous variable that may or may not be true in practice. In essence, if you are willing to assume that the dichotomous variable comes from an underlying normal distribution, you can boost the correlation. However, if you are wrong and it's not, you may end up coming to the wrong conclusion. You can certainly define measures of dependence between a nominal and continuous variable but this is going to get tricky because a nominal variable isn't really a variable (instead it is K-1 indicator variables, where K is the number of categories). Jay * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
Powered by Nabble | Edit this page |