Stas Kolenikov <[hidden email]> wrote: > >Inference for Pearson's moment correlation relies on normality of the >data. Spearman rank correlation is free of any assumptions, but there >is no population characteristic that it estimates, which makes >interpretation and asymptotic inference somewhat weird. If one is >significant and the other is not, you are making either type I or type >II error somewhere. > >On 10/6/09, Ashwin Ananthakrishnan <[hidden email]> wrote: >> Hi, >> >> In examining the correlation between two variables, what is the >difference in utility of the Spearman correlation co-efficient (stata >command 'spearman') and the Pearson correlation co-efficient (stata >command "pwcorr" or "correlate")? In the angels on the head of a pin vein: Of possible interest in this regard is that the Spearman coefficient is the same as the Pearson calculated on the ranked values of the variables (ties getting the average rank). I would agree that this is not a terribly interesting population parameter, but isn't this nevertheless an estimable/testable population characteristic? Regards, =-=-=-=-=-=-=-=-=-=-=-=-= Mike Lacy Fort Collins CO USA (970) 491-6721 office * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
> >Inference for Pearson's moment correlation relies on normality of the
> >data. Spearman rank correlation is free of any assumptions, but there > >is no population characteristic that it estimates, which makes > >interpretation and asymptotic inference somewhat weird. If one is > >significant and the other is not, you are making either type I or type > >II error somewhere. > In the angels on the head of a pin vein: > Of possible interest in this regard is that the Spearman coefficient is the > same as the Pearson calculated on the ranked values of the variables (ties > getting the average rank). I would agree that this is not a terribly > interesting population parameter, but isn't this nevertheless an > estimable/testable population characteristic? If you have a finite population, then of course you will have Spearman correlation for it. Although if you want to set up any asymptotic framework, you will be trying to hit a moving target. I don't think there is a meaningful definition of Spearman correlation for infinite populations/continuous variables, although I might be mistaken. On the other hand, Kendall's tau, as Nick Cox quoted from Roger Newson, has explicit population analogues in probabilities of concordant and discordant pairs of observations. The question is: if the correlation estimate is 0.5, what does it say? For Pearson moment correlation, it means that the proportion of explained variance in a bivariate regression is 0.25. For Kendall's tau, it means that for every discordant pair of observations, there are three concordant pairs (i.e., Prob[ concordant ] = 3 Prob[ discordant ] = 3/4 ). For Spearman rank correlation, you can only say that the variables are positively associated, but not much more. -- Stas Kolenikov, also found at http://stas.kolenikov.name Small print: I use this email account for mailing lists only. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
There IS an interpretation of the Spearman correlation for continuous variables in an infinite population. In that case, if the random variables are X and Y, then the Spearman rho(X,Y) is simply the Pearson correlation of F_X(X) and F_Y(Y), where F_X(.) and F_Y(.) are the population cumulative distribution functions of X and Y respectively. And a Pearson correlation, as always, is a measure of linearity.
The two main problems with the Spearman rho are that (a) it is ONLY a measure of linearity between 2 cumulative distribution functions (with no interpretation as a difference between concordance and discordance probabilities), and that (b) the Central Limit Theorem works a lot less quickly for the sample Spearman rho than for the sample Kendall tau-a, especially under the null hypothesis of zero correlation (see Kendall and Gibbons, 1990). Best wishes Roger References Kendall, M. G., and J. D. Gibbons. 1990. Rank Correlation Methods. 5th ed. Oxford, UK: Oxford University Press. Roger B Newson BSc MSc DPhil Lecturer in Medical Statistics Respiratory Epidemiology and Public Health Group National Heart and Lung Institute Imperial College London Royal Brompton Campus Room 33, Emmanuel Kaye Building 1B Manresa Road London SW3 6LR UNITED KINGDOM Tel: +44 (0)20 7352 8121 ext 3381 Fax: +44 (0)20 7351 8322 Email: [hidden email] Web page: http://www.imperial.ac.uk/nhli/r.newson/ Departmental Web page: http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/ Opinions expressed are those of the author, not of the institution. -----Original Message----- From: [hidden email] [mailto:[hidden email]] On Behalf Of Stas Kolenikov Sent: 07 October 2009 21:27 To: [hidden email] Subject: Re: st: difference between "Spearman" and "pwcorr / correlate" > >Inference for Pearson's moment correlation relies on normality of the > >data. Spearman rank correlation is free of any assumptions, but there > >is no population characteristic that it estimates, which makes > >interpretation and asymptotic inference somewhat weird. If one is > >significant and the other is not, you are making either type I or type > >II error somewhere. > In the angels on the head of a pin vein: > Of possible interest in this regard is that the Spearman coefficient is the > same as the Pearson calculated on the ranked values of the variables (ties > getting the average rank). I would agree that this is not a terribly > interesting population parameter, but isn't this nevertheless an > estimable/testable population characteristic? If you have a finite population, then of course you will have Spearman correlation for it. Although if you want to set up any asymptotic framework, you will be trying to hit a moving target. I don't think there is a meaningful definition of Spearman correlation for infinite populations/continuous variables, although I might be mistaken. On the other hand, Kendall's tau, as Nick Cox quoted from Roger Newson, has explicit population analogues in probabilities of concordant and discordant pairs of observations. The question is: if the correlation estimate is 0.5, what does it say? For Pearson moment correlation, it means that the proportion of explained variance in a bivariate regression is 0.25. For Kendall's tau, it means that for every discordant pair of observations, there are three concordant pairs (i.e., Prob[ concordant ] = 3 Prob[ discordant ] = 3/4 ). For Spearman rank correlation, you can only say that the variables are positively associated, but not much more. -- Stas Kolenikov, also found at http://stas.kolenikov.name Small print: I use this email account for mailing lists only. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
(a) is on all fours with "the problem with pink is that it isn't blue".
That is, (a) amounts to saying that the problem with Spearman's rank is that it's not Kendall's tau. True, but the reverse is equally true. That aside, I think most users of rank correlation would be happy to acknowledge advantages and disadvantages of each such measure, and indeed to note that they should give similar results in practice. For example, given the property emphasised earlier in the thread that Spearman(x, y) = Pearson(rank(x), rank(y)) one of many possibilities for Spearman correlations is that they offer a route to a robustified PCA. (You can be sure that the eigenproperties are OK.) Nick [hidden email] Newson, Roger B There IS an interpretation of the Spearman correlation for continuous variables in an infinite population. In that case, if the random variables are X and Y, then the Spearman rho(X,Y) is simply the Pearson correlation of F_X(X) and F_Y(Y), where F_X(.) and F_Y(.) are the population cumulative distribution functions of X and Y respectively. And a Pearson correlation, as always, is a measure of linearity. The two main problems with the Spearman rho are that (a) it is ONLY a measure of linearity between 2 cumulative distribution functions (with no interpretation as a difference between concordance and discordance probabilities), and that (b) the Central Limit Theorem works a lot less quickly for the sample Spearman rho than for the sample Kendall tau-a, especially under the null hypothesis of zero correlation (see Kendall and Gibbons, 1990). References Kendall, M. G., and J. D. Gibbons. 1990. Rank Correlation Methods. 5th ed. Oxford, UK: Oxford University Press. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
In reply to this post by Stas Kolenikov
There's a tacit criterion here, that techniques must have simple verbal
interpretations. I am as much in favour of simple verbal interpretations as the next person -- nay, on average, more so -- but while they're a bonus when available insisting on them would deprive you of much that is indispensable. What's the simple verbal interpretation of (say) eigenvectors or an SVD? Nick [hidden email] Stas Kolenikov If you have a finite population, then of course you will have Spearman correlation for it. Although if you want to set up any asymptotic framework, you will be trying to hit a moving target. I don't think there is a meaningful definition of Spearman correlation for infinite populations/continuous variables, although I might be mistaken. On the other hand, Kendall's tau, as Nick Cox quoted from Roger Newson, has explicit population analogues in probabilities of concordant and discordant pairs of observations. The question is: if the correlation estimate is 0.5, what does it say? For Pearson moment correlation, it means that the proportion of explained variance in a bivariate regression is 0.25. For Kendall's tau, it means that for every discordant pair of observations, there are three concordant pairs (i.e., Prob[ concordant ] = 3 Prob[ discordant ] = 3/4 ). For Spearman rank correlation, you can only say that the variables are positively associated, but not much more. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
On Thu, Oct 8, 2009 at 11:33 AM, Nick Cox <[hidden email]> wrote:
> There's a tacit criterion here, that techniques must have simple verbal > interpretations. I am as much in favour of simple verbal interpretations > as the next person -- nay, on average, more so -- but while they're a > bonus when available insisting on them would deprive you of much that is > indispensable. > > What's the simple verbal interpretation of (say) eigenvectors or an SVD? The eigenproblems are very visual. The eigenvalues tell you by how much a unit vector will change its length, and eigenvectors give those specific vectors and directions of where the change is exact: the vector stretches without any rotation. If we talk about an eigenproblem for a covariance matrix, then the eigenvalues are the "radii" of an rugby/American football of the points in multivariate space, and eigenvectors are again directions that give the orientation of that rugby ball relative to the "official" axes. SVDs can be explained by the -biplot-s, although with greater effort. I usually want to know what I am estimating. Then I can eyeball something along the lines of "the difference between the unknown population distribution function and the sample distribution is such and such, and hence by an appropriate version of the influence function expansions and/or the delta-method, the difference between the unknown parameter and the estimate at hand must be of such and such order." Thanks to Roger, I now have a better clue of what I am estimating with Spearman correlation. And there are probably a dozen other rank-type correlations that would make at least as much sense as (linear) correlation of the cdfs. One other comparison can be made regarding the computational requirements. Spearman's rho is O( n log(n) ) due to sorting, while Kendall's tau is O( n^2 ) for the pairwise comparisons. Of course Pearson's moment correlation is O( n ), it's just manipulation of sums. One would only see differences in timing of Pearson and Spearman with the sample sizes such that -sort- takes a noticeable amount of time, while Kendall's tau is slow with more than 100 observations. -- Stas Kolenikov, also found at http://stas.kolenikov.name Small print: I use this email account for mailing lists only. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
My point needs rephrasing. I draw a distinction between verbal
definitions or characterisations on the one hand and verbal analogies on the other. The difference lies in whether you can take the verbal statements and reconstruct the formula or method from them; with mere analogies you can't do that. However, Pearson correlations are pretty much defined by their square being the fraction of variance explained by the corresponding regression, modulo sign of course. In contrast, if I explain Spearman correlation in verbal terms as a measure of monotonicity that does not imply the particular formula used. Nick [hidden email] Stas Kolenikov On Thu, Oct 8, 2009 at 11:33 AM, Nick Cox <[hidden email]> wrote: > There's a tacit criterion here, that techniques must have simple verbal > interpretations. I am as much in favour of simple verbal interpretations > as the next person -- nay, on average, more so -- but while they're a > bonus when available insisting on them would deprive you of much that is > indispensable. > > What's the simple verbal interpretation of (say) eigenvectors or an SVD? The eigenproblems are very visual. The eigenvalues tell you by how much a unit vector will change its length, and eigenvectors give those specific vectors and directions of where the change is exact: the vector stretches without any rotation. If we talk about an eigenproblem for a covariance matrix, then the eigenvalues are the "radii" of an rugby/American football of the points in multivariate space, and eigenvectors are again directions that give the orientation of that rugby ball relative to the "official" axes. SVDs can be explained by the -biplot-s, although with greater effort. I usually want to know what I am estimating. Then I can eyeball something along the lines of "the difference between the unknown population distribution function and the sample distribution is such and such, and hence by an appropriate version of the influence function expansions and/or the delta-method, the difference between the unknown parameter and the estimate at hand must be of such and such order." Thanks to Roger, I now have a better clue of what I am estimating with Spearman correlation. And there are probably a dozen other rank-type correlations that would make at least as much sense as (linear) correlation of the cdfs. One other comparison can be made regarding the computational requirements. Spearman's rho is O( n log(n) ) due to sorting, while Kendall's tau is O( n^2 ) for the pairwise comparisons. Of course Pearson's moment correlation is O( n ), it's just manipulation of sums. One would only see differences in timing of Pearson and Spearman with the sample sizes such that -sort- takes a noticeable amount of time, while Kendall's tau is slow with more than 100 observations. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
Powered by Nabble | Edit this page |