Quantcast

st: Correlation of Dummy and Metric Variables?

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

st: Correlation of Dummy and Metric Variables?

Christian Weiss-2
Dear Statalist,

although it's not a particularly Stata specific question , I am hoping
to get advise on the following (basic?) question:


I am using the following command to get a correlation matrix

        quietly estpost correlate `vars', matrix
        esttab using correlations.csv, not unstack compress noobs star(* 0.10
** 0.05 *** 0.01) long b(%9.2f) replace

`vars' containts a battery of mostly metric variables. Besides the
metric variables, there is also three dummy variables.

I am wondering now if the reported (relatively high) correlation
coefficients among the dummy variables and between some of the metric
variables and the dummy variables are actually meaningful. How to
interpret them / which correlation test to use?


Thank's a lot,
Christian
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

st: AW: Correlation of Dummy and Metric Variables?

Martin Weiss-5

<>



Try Nick`s http://www.stata.com/statalist/archive/2008-11/msg00933.html




HTH
Martin


-----Ursprüngliche Nachricht-----
Von: [hidden email]
[mailto:[hidden email]] Im Auftrag von Christian Weiß
Gesendet: Montag, 21. September 2009 16:45
An: statalist
Betreff: st: Correlation of Dummy and Metric Variables?

Dear Statalist,

although it's not a particularly Stata specific question , I am hoping
to get advise on the following (basic?) question:


I am using the following command to get a correlation matrix

        quietly estpost correlate `vars', matrix
        esttab using correlations.csv, not unstack compress noobs star(*
0.10
** 0.05 *** 0.01) long b(%9.2f) replace

`vars' containts a battery of mostly metric variables. Besides the
metric variables, there is also three dummy variables.

I am wondering now if the reported (relatively high) correlation
coefficients among the dummy variables and between some of the metric
variables and the dummy variables are actually meaningful. How to
interpret them / which correlation test to use?


Thank's a lot,
Christian
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: st: AW: Correlation of Dummy and Metric Variables?

Christian Weiss-2
In reply to this post by Christian Weiss-2
Hi Martin,

thx a lot, I already had a look at this discussion and Nick's article.
Unfortunately, I can't find a clear answer to my question. First, the
discussion / article is about correlation a continous and a discrete
variable, I am wondering though how to deal with a dummy (0 / 1)
variable (perhaps the same way?).
Second, in his mail Nick says - if I understood him right - that the
correlation between a discrete and continous variable can make sense,
but that it depends (on what?).

Still, I find it surprising that there is no comment / option in the
correlate commands of Stata, as I figure that this is a frequently
occuring issue (as Nick mentions). Afterall, I might be usual (and
easiest way?) to exclude dummy variables from a correlation matrix?`

Best
Christian



On Mon, Sep 21, 2009 at 10:51 AM, Martin Weiss <[hidden email]> wrote:

>
> <>
>
>
>
> Try Nick`s http://www.stata.com/statalist/archive/2008-11/msg00933.html
>
>
>
>
> HTH
> Martin
>
>
> -----Ursprüngliche Nachricht-----
> Von: [hidden email]
> [mailto:[hidden email]] Im Auftrag von Christian Weiß
> Gesendet: Montag, 21. September 2009 16:45
> An: statalist
> Betreff: st: Correlation of Dummy and Metric Variables?
>
> Dear Statalist,
>
> although it's not a particularly Stata specific question , I am hoping
> to get advise on the following (basic?) question:
>
>
> I am using the following command to get a correlation matrix
>
>        quietly estpost correlate `vars', matrix
>        esttab using correlations.csv, not unstack compress noobs star(*
> 0.10
> ** 0.05 *** 0.01) long b(%9.2f) replace
>
> `vars' containts a battery of mostly metric variables. Besides the
> metric variables, there is also three dummy variables.
>
> I am wondering now if the reported (relatively high) correlation
> coefficients among the dummy variables and between some of the metric
> variables and the dummy variables are actually meaningful. How to
> interpret them / which correlation test to use?
>
>
> Thank's a lot,
> Christian
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

AW: st: AW: Correlation of Dummy and Metric Variables?

Martin Weiss-5

<>

". First, the
discussion / article is about correlation a continous and a discrete
variable, I am wondering though how to deal with a dummy (0 / 1)
variable (perhaps the same way?)."



Is a dummy not a special case of a discrete variable? So the methods
applicable to all discrete variables should be applicable to the dummy case
as well...



HTH
Martin


-----Ursprüngliche Nachricht-----
Von: [hidden email]
[mailto:[hidden email]] Im Auftrag von Christian Weiß
Gesendet: Montag, 21. September 2009 17:26
An: [hidden email]
Betreff: Re: st: AW: Correlation of Dummy and Metric Variables?

Hi Martin,

thx a lot, I already had a look at this discussion and Nick's article.
Unfortunately, I can't find a clear answer to my question. First, the
discussion / article is about correlation a continous and a discrete
variable, I am wondering though how to deal with a dummy (0 / 1)
variable (perhaps the same way?).
Second, in his mail Nick says - if I understood him right - that the
correlation between a discrete and continous variable can make sense,
but that it depends (on what?).

Still, I find it surprising that there is no comment / option in the
correlate commands of Stata, as I figure that this is a frequently
occuring issue (as Nick mentions). Afterall, I might be usual (and
easiest way?) to exclude dummy variables from a correlation matrix?`

Best
Christian



On Mon, Sep 21, 2009 at 10:51 AM, Martin Weiss <[hidden email]> wrote:

>
> <>
>
>
>
> Try Nick`s http://www.stata.com/statalist/archive/2008-11/msg00933.html
>
>
>
>
> HTH
> Martin
>
>
> -----Ursprüngliche Nachricht-----
> Von: [hidden email]
> [mailto:[hidden email]] Im Auftrag von Christian
Weiß

> Gesendet: Montag, 21. September 2009 16:45
> An: statalist
> Betreff: st: Correlation of Dummy and Metric Variables?
>
> Dear Statalist,
>
> although it's not a particularly Stata specific question , I am hoping
> to get advise on the following (basic?) question:
>
>
> I am using the following command to get a correlation matrix
>
>        quietly estpost correlate `vars', matrix
>        esttab using correlations.csv, not unstack compress noobs star(*
> 0.10
> ** 0.05 *** 0.01) long b(%9.2f) replace
>
> `vars' containts a battery of mostly metric variables. Besides the
> metric variables, there is also three dummy variables.
>
> I am wondering now if the reported (relatively high) correlation
> coefficients among the dummy variables and between some of the metric
> variables and the dummy variables are actually meaningful. How to
> interpret them / which correlation test to use?
>
>
> Thank's a lot,
> Christian
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

st: RE: Correlation of Dummy and Metric Variables?

Lachenbruch, Peter
In reply to this post by Christian Weiss-2
The correlation of a dichotomous variable and a continuous (normal?) variable is closely related to the t-test.  If you look at the p-value it is exactly the same as the p-value from a t-test for the continuous variable using the dichotomous variable as the 'by' variable.  It's meaning depends on whether the continuous variable is close to normal

Tony

Peter A. Lachenbruch
Department of Public Health
Oregon State University
Corvallis, OR 97330
Phone: 541-737-3832
FAX: 541-737-4001


-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Christian Weiß
Sent: Monday, September 21, 2009 7:45 AM
To: statalist
Subject: st: Correlation of Dummy and Metric Variables?

Dear Statalist,

although it's not a particularly Stata specific question , I am hoping
to get advise on the following (basic?) question:


I am using the following command to get a correlation matrix

        quietly estpost correlate `vars', matrix
        esttab using correlations.csv, not unstack compress noobs star(* 0.10
** 0.05 *** 0.01) long b(%9.2f) replace

`vars' containts a battery of mostly metric variables. Besides the
metric variables, there is also three dummy variables.

I am wondering now if the reported (relatively high) correlation
coefficients among the dummy variables and between some of the metric
variables and the dummy variables are actually meaningful. How to
interpret them / which correlation test to use?


Thank's a lot,
Christian
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: st: Correlation of Dummy and Metric Variables?

Stas Kolenikov
In reply to this post by Christian Weiss-2
In psychometrics, there are concepts of polychoric and polyserial
correlations. The first one is between two ordinal variables, and the
second one is between an ordinal variable and a continuous variable.
If your variables are truly nominal (like gender or geography), then
the correlations are likely meaningless, although you can meaningfully
ask whether the distributions of the continuous variables differ
between the values of the discrete variable (answered by ANOVA,
Kruskal-Wallis test and such). I wrote -polychoric- package some while
ago that computes these correlations.

On Mon, Sep 21, 2009 at 9:45 AM, Christian Weiß <[hidden email]> wrote:

> Dear Statalist,
>
> although it's not a particularly Stata specific question , I am hoping
> to get advise on the following (basic?) question:
>
>
> I am using the following command to get a correlation matrix
>
>        quietly estpost correlate `vars', matrix
>        esttab using correlations.csv, not unstack compress noobs star(* 0.10
> ** 0.05 *** 0.01) long b(%9.2f) replace
>
> `vars' containts a battery of mostly metric variables. Besides the
> metric variables, there is also three dummy variables.
>
> I am wondering now if the reported (relatively high) correlation
> coefficients among the dummy variables and between some of the metric
> variables and the dummy variables are actually meaningful. How to
> interpret them / which correlation test to use?
>

--
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: st: AW: Correlation of Dummy and Metric Variables?

Nick Cox
In reply to this post by Christian Weiss-2
Even the extensive Stata manuals do not take on the task of trying to explain all the statistical possibilities and pitfalls associated with every technique implemented. Why should they?

The literature -- in this case make that literatures -- is hopelessly divided on whether

1. Correlation makes full sense only for (approximately) continuous variables.

2. You need different ideas of correlation when at least one variable is not (approximately) continuous.

Even with two dummies or indicators, there is a range of possibilities.

It is better to think out what makes sense for your project than to be in fear that what you are doing is incorrect according to some authorities or experts.

Nick
[hidden email]

Christian Weiß

thx a lot, I already had a look at this discussion and Nick's article.
Unfortunately, I can't find a clear answer to my question. First, the
discussion / article is about correlation a continous and a discrete
variable, I am wondering though how to deal with a dummy (0 / 1)
variable (perhaps the same way?).
Second, in his mail Nick says - if I understood him right - that the
correlation between a discrete and continous variable can make sense,
but that it depends (on what?).

Still, I find it surprising that there is no comment / option in the
correlate commands of Stata, as I figure that this is a frequently
occuring issue (as Nick mentions). Afterall, I might be usual (and
easiest way?) to exclude dummy variables from a correlation matrix?`


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

st: RE: Correlation of Dummy and Metric Variables?

Verkuilen, Jay
In reply to this post by Christian Weiss-2
The point-biserial correlation is a correlation between a dichotomous variable and a continuous one. http://en.wikipedia.org/wiki/Point-biserial_correlation_coefficient

Peter Lachenbruch noted that this ends up being the same math as the t-test.

Unfortunately, skew in the dichotomous variable tends to reduce correlations. Thus methods such as the biserial correlation (special case of the polyserial that Stas mentioned) "fix up" the correlation at the cost of making some assumptions about what the dichotomous variable that may or may not be true in practice. In essence, if you are willing to assume that the dichotomous variable comes from an underlying normal distribution, you can boost the correlation. However, if you are wrong and it's not, you may end up coming to the wrong conclusion.

You can certainly define measures of dependence between a nominal and continuous variable but this is going to get tricky because a nominal variable isn't really a variable (instead it is K-1 indicator variables, where K is the number of categories).

Jay




*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

st: RE: RE: Correlation of Dummy and Metric Variables?

Verkuilen, Jay
Loading...