# Re:st: difference between "Spearman" and "pwcorr / correlate"

7 messages
Open this post in threaded view
|

## Re:st: difference between "Spearman" and "pwcorr / correlate"

 Stas Kolenikov <[hidden email]> wrote:  >  >Inference for Pearson's moment correlation relies on normality of the  >data. Spearman rank correlation is free of any assumptions, but there  >is no population characteristic that it estimates, which makes  >interpretation and asymptotic inference somewhat weird. If one is  >significant and the other is not, you are making either type I or type  >II error somewhere.  >  >On 10/6/09, Ashwin Ananthakrishnan <[hidden email]> wrote:  >> Hi,  >>  >>  In examining the correlation between two variables, what is the  >difference in utility of the Spearman correlation co-efficient (stata  >command 'spearman') and the Pearson correlation co-efficient (stata  >command "pwcorr" or "correlate")? In the angels on the head of a pin vein: Of possible interest in this regard is that the Spearman coefficient is the same as the Pearson calculated on the ranked values of the variables (ties getting the average rank).  I would agree that this is not a terribly interesting population parameter, but isn't this nevertheless an estimable/testable population characteristic? Regards, =-=-=-=-=-=-=-=-=-=-=-=-= Mike Lacy Fort Collins CO USA (970) 491-6721 office * *   For searches and help try: *   http://www.stata.com/help.cgi?search*   http://www.stata.com/support/statalist/faq*   http://www.ats.ucla.edu/stat/stata/
Open this post in threaded view
|

## Re: st: difference between "Spearman" and "pwcorr / correlate"

 >  >Inference for Pearson's moment correlation relies on normality of the >  >data. Spearman rank correlation is free of any assumptions, but there >  >is no population characteristic that it estimates, which makes >  >interpretation and asymptotic inference somewhat weird. If one is >  >significant and the other is not, you are making either type I or type >  >II error somewhere. >  In the angels on the head of a pin vein: >  Of possible interest in this regard is that the Spearman coefficient is the > same as the Pearson calculated on the ranked values of the variables (ties > getting the average rank).  I would agree that this is not a terribly > interesting population parameter, but isn't this nevertheless an > estimable/testable population characteristic? If you have a finite population, then of course you will have Spearman correlation for it. Although if you want to set up any asymptotic framework, you will be trying to hit a moving target. I don't think there is a meaningful definition of Spearman correlation for infinite populations/continuous variables, although I might be mistaken. On the other hand, Kendall's tau, as Nick Cox quoted from Roger Newson, has explicit population analogues in probabilities of concordant and discordant pairs of observations. The question is: if the correlation estimate is 0.5, what does it say? For Pearson moment correlation, it means that the proportion of explained variance in a bivariate regression is 0.25. For Kendall's tau, it means that for every discordant pair of observations, there are three concordant pairs (i.e., Prob[ concordant ] = 3 Prob[ discordant ] = 3/4 ). For Spearman rank correlation, you can only say that the variables are positively associated, but not much more. -- Stas Kolenikov, also found at http://stas.kolenikov.nameSmall print: I use this email account for mailing lists only. * *   For searches and help try: *   http://www.stata.com/help.cgi?search*   http://www.stata.com/support/statalist/faq*   http://www.ats.ucla.edu/stat/stata/
Open this post in threaded view
|

## RE: st: difference between "Spearman" and "pwcorr / correlate"

 There IS an interpretation of the Spearman correlation for continuous variables in an infinite population. In that case, if the random variables are X and Y, then the Spearman rho(X,Y) is simply the Pearson correlation of F_X(X) and F_Y(Y), where F_X(.) and F_Y(.) are the population cumulative distribution functions of X and Y respectively. And a Pearson correlation, as always, is a measure of linearity. The two main problems with the Spearman rho are that (a) it is ONLY a measure of linearity between 2 cumulative distribution functions (with no interpretation as a difference between concordance and discordance probabilities), and that (b) the Central Limit Theorem works a lot less quickly for the sample Spearman rho than for the sample Kendall tau-a, especially under the null hypothesis of zero correlation (see Kendall and Gibbons, 1990). Best wishes Roger References Kendall, M. G., and J. D. Gibbons. 1990. Rank Correlation Methods. 5th ed. Oxford, UK: Oxford University Press. Roger B Newson BSc MSc DPhil Lecturer in Medical Statistics Respiratory Epidemiology and Public Health Group National Heart and Lung Institute Imperial College London Royal Brompton Campus Room 33, Emmanuel Kaye Building 1B Manresa Road London SW3 6LR UNITED KINGDOM Tel: +44 (0)20 7352 8121 ext 3381 Fax: +44 (0)20 7351 8322 Email: [hidden email] Web page: http://www.imperial.ac.uk/nhli/r.newson/Departmental Web page: http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/Opinions expressed are those of the author, not of the institution. -----Original Message----- From: [hidden email] [mailto:[hidden email]] On Behalf Of Stas Kolenikov Sent: 07 October 2009 21:27 To: [hidden email] Subject: Re: st: difference between "Spearman" and "pwcorr / correlate" >  >Inference for Pearson's moment correlation relies on normality of the >  >data. Spearman rank correlation is free of any assumptions, but there >  >is no population characteristic that it estimates, which makes >  >interpretation and asymptotic inference somewhat weird. If one is >  >significant and the other is not, you are making either type I or type >  >II error somewhere. >  In the angels on the head of a pin vein: >  Of possible interest in this regard is that the Spearman coefficient is the > same as the Pearson calculated on the ranked values of the variables (ties > getting the average rank).  I would agree that this is not a terribly > interesting population parameter, but isn't this nevertheless an > estimable/testable population characteristic? If you have a finite population, then of course you will have Spearman correlation for it. Although if you want to set up any asymptotic framework, you will be trying to hit a moving target. I don't think there is a meaningful definition of Spearman correlation for infinite populations/continuous variables, although I might be mistaken. On the other hand, Kendall's tau, as Nick Cox quoted from Roger Newson, has explicit population analogues in probabilities of concordant and discordant pairs of observations. The question is: if the correlation estimate is 0.5, what does it say? For Pearson moment correlation, it means that the proportion of explained variance in a bivariate regression is 0.25. For Kendall's tau, it means that for every discordant pair of observations, there are three concordant pairs (i.e., Prob[ concordant ] = 3 Prob[ discordant ] = 3/4 ). For Spearman rank correlation, you can only say that the variables are positively associated, but not much more. -- Stas Kolenikov, also found at http://stas.kolenikov.nameSmall print: I use this email account for mailing lists only. * *   For searches and help try: *   http://www.stata.com/help.cgi?search*   http://www.stata.com/support/statalist/faq*   http://www.ats.ucla.edu/stat/stata/* *   For searches and help try: *   http://www.stata.com/help.cgi?search*   http://www.stata.com/support/statalist/faq*   http://www.ats.ucla.edu/stat/stata/
Open this post in threaded view
|

## RE: st: difference between "Spearman" and "pwcorr / correlate"

 (a) is on all fours with "the problem with pink is that it isn't blue". That is, (a) amounts to saying that the problem with Spearman's rank is that it's not Kendall's tau. True, but the reverse is equally true. That aside, I think most users of rank correlation would be happy to acknowledge advantages and disadvantages of each such measure, and indeed to note that they should give similar results in practice. For example, given the property emphasised earlier in the thread that Spearman(x, y) = Pearson(rank(x), rank(y)) one of many possibilities for Spearman correlations is that they offer a route to a robustified PCA. (You can be sure that the eigenproperties are OK.) Nick [hidden email] Newson, Roger B There IS an interpretation of the Spearman correlation for continuous variables in an infinite population. In that case, if the random variables are X and Y, then the Spearman rho(X,Y) is simply the Pearson correlation of F_X(X) and F_Y(Y), where F_X(.) and F_Y(.) are the population cumulative distribution functions of X and Y respectively. And a Pearson correlation, as always, is a measure of linearity. The two main problems with the Spearman rho are that (a) it is ONLY a measure of linearity between 2 cumulative distribution functions (with no interpretation as a difference between concordance and discordance probabilities), and that (b) the Central Limit Theorem works a lot less quickly for the sample Spearman rho than for the sample Kendall tau-a, especially under the null hypothesis of zero correlation (see Kendall and Gibbons, 1990). References Kendall, M. G., and J. D. Gibbons. 1990. Rank Correlation Methods. 5th ed. Oxford, UK: Oxford University Press. * *   For searches and help try: *   http://www.stata.com/help.cgi?search*   http://www.stata.com/support/statalist/faq*   http://www.ats.ucla.edu/stat/stata/
Open this post in threaded view
|

## RE: st: difference between "Spearman" and "pwcorr / correlate"

 In reply to this post by Stas Kolenikov There's a tacit criterion here, that techniques must have simple verbal interpretations. I am as much in favour of simple verbal interpretations as the next person -- nay, on average, more so -- but while they're a bonus when available insisting on them would deprive you of much that is indispensable. What's the simple verbal interpretation of (say) eigenvectors or an SVD? Nick [hidden email] Stas Kolenikov If you have a finite population, then of course you will have Spearman correlation for it. Although if you want to set up any asymptotic framework, you will be trying to hit a moving target. I don't think there is a meaningful definition of Spearman correlation for infinite populations/continuous variables, although I might be mistaken. On the other hand, Kendall's tau, as Nick Cox quoted from Roger Newson, has explicit population analogues in probabilities of concordant and discordant pairs of observations. The question is: if the correlation estimate is 0.5, what does it say? For Pearson moment correlation, it means that the proportion of explained variance in a bivariate regression is 0.25. For Kendall's tau, it means that for every discordant pair of observations, there are three concordant pairs (i.e., Prob[ concordant ] = 3 Prob[ discordant ] = 3/4 ). For Spearman rank correlation, you can only say that the variables are positively associated, but not much more. * *   For searches and help try: *   http://www.stata.com/help.cgi?search*   http://www.stata.com/support/statalist/faq*   http://www.ats.ucla.edu/stat/stata/