# Chi-squared test for independence of observed and expected frequencies

22 messages
12
Open this post in threaded view
|

## Chi-squared test for independence of observed and expected frequencies

 Dear all, I am trying to copy the approach of Dittmar/Thakor (2007) "Why do firms issue equity?" p. 27: The authors divide their sample of debt and equity issuers into quartiles based on two explanatory variables, i.e. building a matrix. Specifically, they examine the observed number of firms that fall into one of the four categories and compare them to the expected frequencies. After that, they apply a chi-squared test for independence to determine if there are more or fewer firms than expected in each category. Untabulated results show that each of these frequencies is significant. I have managed to build the 4x3 matrix of observed and expected frequencies using the user-written program ". tabchi [1. Dimension] [2. Dimension]". The tabulated statistics include Pearson chi2(6) =  15.0080   Pr = 0.020 and likelihood-ratio chi2(6) =  15.4736   Pr = 0.017. However, I struggle to conduct this chi-squared test for independence to determine if there are more or fewer firms than expected in each category. I have tried user-written program ". chitesti" (part of the program tab_chi), plugging into it the expected and observed frequencies. This gives me Pearson chi2(11) =  15.0257   Pr =  0.181 and likelihood-ratio chi2(11) = 15.6908   Pr =  0.153. But this does not allow me to test the frequencies of each (!) category. What am I doing wrong? What is the correct and straightforward approach in Stata for this type of problem? Many thanks for considering this posting. Regards Marc * *   For searches and help try: *   http://www.stata.com/help.cgi?search*   http://www.stata.com/support/statalist/faq*   http://www.ats.ucla.edu/stat/stata/
Open this post in threaded view
|

## Re: Chi-squared test for independence of observed and expected frequencies

 --- On Thu, 15/7/10, Marc Michelsen wrote: > I am trying to copy the approach of Dittmar/Thakor (2007) > "Why do firms issue equity?" Please report a complete reference. > p. 27: The authors divide their sample of debt and equity > issuers into quartiles based on two explanatory variables, That is a horrible idea: You are throwing away huge amounts of information. Just use your favourite regression like model with your dependent variable your independent variables. If you worry about functional form, use splines. > I have managed to build the 4x3 matrix of observed and > expected frequencies using the user-written program ". > tabchi [1. Dimension] [2. Dimension]". However, I struggle > to conduct this chi-squared test for independence to > determine if there are more or fewer firms than expected > in each category. You are already done, the chi square test will only give you this overall measure of whether your table deviates from independence, it will not give you a cell by cell test. If you want to model patterns in your table you will have to use what sociologists call loglinear models, see (Hout 1983) for an introduction. You don't want to go there unless you really need to. You don't need to, since you should not do this anyhow, as you should not waste the valuable information you have by only using the quartiles. Hope this helps, Maarten Mike Hout (1983) "Mobility Tables". Quantitative Applications in the Social Sciences, nr. 31. Thousand Oaks: Sage. -------------------------- Maarten L. Buis Institut fuer Soziologie Universitaet Tuebingen Wilhelmstrasse 36 72074 Tuebingen Germany http://www.maartenbuis.nl--------------------------       * *   For searches and help try: *   http://www.stata.com/help.cgi?search*   http://www.stata.com/support/statalist/faq*   http://www.ats.ucla.edu/stat/stata/
Open this post in threaded view
|

## Re: Chi-squared test for independence of observed and expected frequencies

 In reply to this post by Marc Michelsen On Thu, Jul 15, 2010 at 10:33 AM, Marc Michelsen <[hidden email]> wrote: > I am trying to copy the approach of Dittmar/Thakor (2007) "Why do firms > issue equity?" p. 27: The authors divide their sample of debt and equity > issuers into quartiles based on two explanatory variables, i.e. building a > matrix. Specifically, they examine the observed number of firms that fall > into one of the four categories and compare them to the expected > frequencies. After that, they apply a chi-squared test for independence to > determine if there are more or fewer firms than expected in each category. > Untabulated results show that each of these frequencies is significant. I agree with Maarten: that's a strange approach. Not that it is totally inappropriate... but it smells like 1960s when computations were essentially restricted to how much handwriting you can fit onto two sheets of paper. Propagating strange approaches does not do a good service to whatever discipline you are in (finance?). If those are continuous variables, you can use two-sample Kolmogorov-Smirnov tests to compare the distributions. I am pretty sure that bivariate versions of K-S tests exist, but they are not implemented in Stata. If the explanatory variables are categorical, you can compare the samples using -tabulate variable debt_vs_equity- as they are. If you want a fancier analysis, you can run -qreg- (or rather -sqreg-) over a set of quantiles, with debt/equity as the explanatory variables, to gauge whether the distributions of the continuous variables are the same for two types of firms. -- Stas Kolenikov, also found at http://stas.kolenikov.nameSmall print: I use this email account for mailing lists only. * *   For searches and help try: *   http://www.stata.com/help.cgi?search*   http://www.stata.com/support/statalist/faq*   http://www.ats.ucla.edu/stat/stata/
Open this post in threaded view
|

## AW: st: Chi-squared test for independence of observed and expected frequencies

 Stas, Maarten, many thanks for your comments. The complete reference is: Dittmar, A., and A. Thakor. "Why do firms issue equity?" Journal of Finance 62 (2007), 1-54. You are totally right, the authors use this analysis only as an add-on / robustness test. The main body of the paper are multivariate analyses. Nevertheless, it would be quite helpful to determine the relative importance of the two explanatory variables (dimensions), i.e. prior stock return (divided into quartiles) and credit rating outlook (positive, negative, stable). Do you have any idea how the authors have tested the significance of each of the frequencies? I will have a look at your three proposed alternatives and see how fancy they are. Regards Marc -----Ursprüngliche Nachricht----- Von: [hidden email] [mailto:[hidden email]] Im Auftrag von Stas Kolenikov Gesendet: Donnerstag, 15. Juli 2010 23:52 An: [hidden email] Betreff: Re: st: Chi-squared test for independence of observed and expected frequencies On Thu, Jul 15, 2010 at 10:33 AM, Marc Michelsen <[hidden email]> wrote: > I am trying to copy the approach of Dittmar/Thakor (2007) "Why do firms > issue equity?" p. 27: The authors divide their sample of debt and equity > issuers into quartiles based on two explanatory variables, i.e. building a > matrix. Specifically, they examine the observed number of firms that fall > into one of the four categories and compare them to the expected > frequencies. After that, they apply a chi-squared test for independence to > determine if there are more or fewer firms than expected in each category. > Untabulated results show that each of these frequencies is significant. I agree with Maarten: that's a strange approach. Not that it is totally inappropriate... but it smells like 1960s when computations were essentially restricted to how much handwriting you can fit onto two sheets of paper. Propagating strange approaches does not do a good service to whatever discipline you are in (finance?). If those are continuous variables, you can use two-sample Kolmogorov-Smirnov tests to compare the distributions. I am pretty sure that bivariate versions of K-S tests exist, but they are not implemented in Stata. If the explanatory variables are categorical, you can compare the samples using -tabulate variable debt_vs_equity- as they are. If you want a fancier analysis, you can run -qreg- (or rather -sqreg-) over a set of quantiles, with debt/equity as the explanatory variables, to gauge whether the distributions of the continuous variables are the same for two types of firms. -- Stas Kolenikov, also found at http://stas.kolenikov.nameSmall print: I use this email account for mailing lists only. * *   For searches and help try: *   http://www.stata.com/help.cgi?search*   http://www.stata.com/support/statalist/faq*   http://www.ats.ucla.edu/stat/stata/* *   For searches and help try: *   http://www.stata.com/help.cgi?search*   http://www.stata.com/support/statalist/faq*   http://www.ats.ucla.edu/stat/stata/
Open this post in threaded view
|

## Re: AW: st: Chi-squared test for independence of observed and expected frequencies

 --- On Fri, 16/7/10, Marc Michelsen wrote: > Do you have any idea how the authors have tested > the significance of each of the frequencies? I don't even know what the null hypothesis should be: independence refers to the whole set of frequencies that make up a cross tabulation. A test on individual frequencies then just does not make sense within this context. -- Maarten -------------------------- Maarten L. Buis Institut fuer Soziologie Universitaet Tuebingen Wilhelmstrasse 36 72074 Tuebingen Germany http://www.maartenbuis.nl--------------------------         * *   For searches and help try: *   http://www.stata.com/help.cgi?search*   http://www.stata.com/support/statalist/faq*   http://www.ats.ucla.edu/stat/stata/
Open this post in threaded view
|

## Re: AW: st: Chi-squared test for independence of observed and expected frequencies

 In reply to this post by Marc Michelsen --- On Fri, 16/7/10, Marc Michelsen wrote: > The complete reference is: Dittmar, A., and A. Thakor. "Why > do firms issue equity?" Journal of Finance 62 (2007), 1-54. Ok, I had a chance to look at this article, but there is no table on page 27, there is a reference to a table IV on the next page. Are you refering to that? If that is the case then that has absolutely nothing to do with a chi square test of independence, it is just a collection of t-tests comparing the averages of two groups on a set of variables. Hope this helps, Maarten -------------------------- Maarten L. Buis Institut fuer Soziologie Universitaet Tuebingen Wilhelmstrasse 36 72074 Tuebingen Germany http://www.maartenbuis.nl--------------------------       * *   For searches and help try: *   http://www.stata.com/help.cgi?search*   http://www.stata.com/support/statalist/faq*   http://www.ats.ucla.edu/stat/stata/
Open this post in threaded view
|

## AW: AW: st: Chi-squared test for independence of observed and expected frequencies

 Maarten, many thanks for your efforts. Indeed, the results for this analysis are untabulated. It just says in the text at the top of page 27 (re. Prediction 2): "Using a chi-squared test for independence to determine if there are more or fewer firms than expected in each category, we show that each of these frequenciesis significant." Marc -----Ursprüngliche Nachricht----- Von: [hidden email] [mailto:[hidden email]] Im Auftrag von Maarten buis Gesendet: Freitag, 16. Juli 2010 11:22 An: [hidden email] Betreff: Re: AW: st: Chi-squared test for independence of observed and expected frequencies --- On Fri, 16/7/10, Marc Michelsen wrote: > The complete reference is: Dittmar, A., and A. Thakor. "Why > do firms issue equity?" Journal of Finance 62 (2007), 1-54. Ok, I had a chance to look at this article, but there is no table on page 27, there is a reference to a table IV on the next page. Are you refering to that? If that is the case then that has absolutely nothing to do with a chi square test of independence, it is just a collection of t-tests comparing the averages of two groups on a set of variables. Hope this helps, Maarten -------------------------- Maarten L. Buis Institut fuer Soziologie Universitaet Tuebingen Wilhelmstrasse 36 72074 Tuebingen Germany http://www.maartenbuis.nl--------------------------       * *   For searches and help try: *   http://www.stata.com/help.cgi?search*   http://www.stata.com/support/statalist/faq*   http://www.ats.ucla.edu/stat/stata/* *   For searches and help try: *   http://www.stata.com/help.cgi?search*   http://www.stata.com/support/statalist/faq*   http://www.ats.ucla.edu/stat/stata/
Open this post in threaded view
|

## Re: AW: AW: st: Chi-squared test for independence of observed and expected frequencies

 --- On Fri, 16/7/10, Marc Michelsen wrote: > Indeed, the results for this analysis are untabulated. It > just says in the text at the top of page 27 (re. Prediction > 2): "Using a chi-squared test for independence to determine > if there are more or fewer firms than expected in each > category, we show that each of these frequenciesis > significant." OK, I see. I would recommend to just forget about that test. As I mentioned before a test on the individual frequencies just does not make sense to me: independence is a characteristic of the entire table not a characteristic of individual frequencies. To do such a test right you'd have to specify a specific hypothesis on the structure of counts, and than do a log-linear model. This is just not worth the effort, given that breaking up your continuous variable into quartiles is a bad idea to begin with. Hope this helps, Maarten -------------------------- Maarten L. Buis Institut fuer Soziologie Universitaet Tuebingen Wilhelmstrasse 36 72074 Tuebingen Germany http://www.maartenbuis.nl--------------------------       * *   For searches and help try: *   http://www.stata.com/help.cgi?search*   http://www.stata.com/support/statalist/faq*   http://www.ats.ucla.edu/stat/stata/
Open this post in threaded view
|

## AW: AW: AW: st: Chi-squared test for independence of observed and expected frequencies

 Agree. Thanks. Marc -----Ursprüngliche Nachricht----- Von: [hidden email] [mailto:[hidden email]] Im Auftrag von Maarten buis Gesendet: Freitag, 16. Juli 2010 12:26 An: [hidden email] Betreff: Re: AW: AW: st: Chi-squared test for independence of observed and expected frequencies --- On Fri, 16/7/10, Marc Michelsen wrote: > Indeed, the results for this analysis are untabulated. It > just says in the text at the top of page 27 (re. Prediction > 2): "Using a chi-squared test for independence to determine > if there are more or fewer firms than expected in each > category, we show that each of these frequenciesis > significant." OK, I see. I would recommend to just forget about that test. As I mentioned before a test on the individual frequencies just does not make sense to me: independence is a characteristic of the entire table not a characteristic of individual frequencies. To do such a test right you'd have to specify a specific hypothesis on the structure of counts, and than do a log-linear model. This is just not worth the effort, given that breaking up your continuous variable into quartiles is a bad idea to begin with. Hope this helps, Maarten -------------------------- Maarten L. Buis Institut fuer Soziologie Universitaet Tuebingen Wilhelmstrasse 36 72074 Tuebingen Germany http://www.maartenbuis.nl--------------------------       * *   For searches and help try: *   http://www.stata.com/help.cgi?search*   http://www.stata.com/support/statalist/faq*   http://www.ats.ucla.edu/stat/stata/* *   For searches and help try: *   http://www.stata.com/help.cgi?search*   http://www.stata.com/support/statalist/faq*   http://www.ats.ucla.edu/stat/stata/
Open this post in threaded view
|

## RE: AW: st: Chi-squared test for independence of observed and expected frequencies

 In reply to this post by Maarten buis However, one may want to test subtables when the overall hypothesis is one of homogeneity of various populations.  The second test is one of independence.  For a full table, the two tests are identical.  When one is looking at subtables one is in the multiple testing mode.  The way to do this is to look at the likelihood ratio chi-square and compare to the critical value for the full table (i.e. (r-1)(c-1) for the full table) even if one is looking at a 2x2 subtable I don't have the exact reference, but it is fairly old -either something by Novick and Grizzle in JASA or Gabriel in Annals of STatistic.  It is before 1980 - if there's demand for this, i can look it up next week. ________________________________________ From: [hidden email] [[hidden email]] On Behalf Of Maarten buis [[hidden email]] Sent: Friday, July 16, 2010 1:36 AM To: [hidden email] Subject: Re: AW: st: Chi-squared test for independence of observed and expected frequencies --- On Fri, 16/7/10, Marc Michelsen wrote: > Do you have any idea how the authors have tested > the significance of each of the frequencies? I don't even know what the null hypothesis should be: independence refers to the whole set of frequencies that make up a cross tabulation. A test on individual frequencies then just does not make sense within this context. -- Maarten -------------------------- Maarten L. Buis Institut fuer Soziologie Universitaet Tuebingen Wilhelmstrasse 36 72074 Tuebingen Germany http://www.maartenbuis.nl-------------------------- * *   For searches and help try: *   http://www.stata.com/help.cgi?search*   http://www.stata.com/support/statalist/faq*   http://www.ats.ucla.edu/stat/stata/* *   For searches and help try: *   http://www.stata.com/help.cgi?search*   http://www.stata.com/support/statalist/faq*   http://www.ats.ucla.edu/stat/stata/
Open this post in threaded view
|

## Re: Chi-squared test for independence of observed and expected frequencies

 This post was updated on . In reply to this post by Marc Michelsen CONTENTS DELETED The author has deleted this message.
Open this post in threaded view
|

## Fixed effects logit model

 In reply to this post by Marc Michelsen Dear Statalist-users, I am estimating a logit model for a panel style data set. In order to guarantee unbiased estimation, I have used company, industry and/or offer year clusters (per Petersen, 2009). For my linear regressions I have made positive experience with fixed-effects models. Their application for binary outcome models is not as straightforward because the models rely solely on within-variance. Running a fixed-effect logit model (-xtlogit, fe) shows highly significant coefficients of my key variables, which would be very beneficial for my study. However, more than 50% of my observations get lost in the regression because of zero within variance. Is it consistent to show also a fixed effects logit model beside standard logit models clustered by the above mentioned characteristics. What do I have to keep in mind when interpreting the results (especially relative to the other ML models)? Is it possible to calculate marginal effects for such a fixed effects model (similar to Cameron/Trivedi, 2009, p. 516? Thank you for considering this posting. Regards Marc   Cameron, A. C., and P. K. Trivedi. Microeconometrics using stata: Stata Press (2009). Petersen, M. A. "Estimating standard errors in finance panel data sets: Comparing approaches." Review of Financial Studies 22 (2009), 435. * *   For searches and help try: *   http://www.stata.com/help.cgi?search*   http://www.stata.com/support/statalist/faq*   http://www.ats.ucla.edu/stat/stata/
Open this post in threaded view
|

## AW: st: Chi-squared test for independence of observed and expected frequencies

Open this post in threaded view
|

## Re: Fixed effects logit model

Open this post in threaded view
|

## AW: st: Fixed effects logit model

Open this post in threaded view
|

## Re: Fixed effects logit model

Open this post in threaded view
|

## Testing for differences

 In reply to this post by Marc Michelsen Dear Statalist-Users, I looking for the appropriate statistic to test for differences in a rated company sample (panel data). The dataset comprises beside company financials, the rating category (e.g. AA, A, BBB) and the rating outlook (positive, negative, stable) per year. In summary, I've got firm-years per rating category that are subdivided by the prevailing rating outlook. I want to test for differences in company characteristics between firms with a positive, negative, stable outlook. However, the comparison has to be done within a rating category. So BBB companies with a negative outlook have to be compared with companies also rated BBB but with a different outlook. But I want to test the whole sample at once and not just sub-samples. How can I replicate this in Stata? Or more general, what is this type of problem called in econometric terms? "Clustered"? Thanks for considering my post. Marc * *   For searches and help try: *   http://www.stata.com/help.cgi?search*   http://www.stata.com/support/statalist/faq*   http://www.ats.ucla.edu/stat/stata/
Open this post in threaded view
|

## Re: Testing for differences

 --- On Thu, 28/10/10, Marc Michelsen wrote: > I looking for the appropriate statistic to test for > differences in a rated company sample (panel data). The > dataset comprises beside company financials, the rating > category (e.g. AA, A, BBB) and the rating outlook > (positive, negative, stable) per year. > > I want to test for differences in company characteristics > between firms with a positive, negative, stable outlook. > However, the comparison has to be done within a rating > category. > > How can I replicate this in Stata? Sounds to me like a regression of a charateristic on rating and outlook. Rating would than be a control variable. You might consider adding interaction terms between rating and outlook. > Or more general, what is this type of problem called in > econometric terms? Just regression. Hope this helps, Maarten -------------------------- Maarten L. Buis Institut fuer Soziologie Universitaet Tuebingen Wilhelmstrasse 36 72074 Tuebingen Germany http://www.maartenbuis.nl--------------------------       * *   For searches and help try: *   http://www.stata.com/help.cgi?search*   http://www.stata.com/support/statalist/faq*   http://www.ats.ucla.edu/stat/stata/
Open this post in threaded view
|

## AW: st: Testing for differences

 Maarten, thanks for this. Would that be a multinomial logistic regression as the rating outlook has the three values "positive, negative, stable"? Regards Marc -----Ursprüngliche Nachricht----- Von: [hidden email] [mailto:[hidden email]] Im Auftrag von Maarten buis Gesendet: Donnerstag, 28. Oktober 2010 15:25 An: [hidden email] Betreff: Re: st: Testing for differences --- On Thu, 28/10/10, Marc Michelsen wrote: > I looking for the appropriate statistic to test for > differences in a rated company sample (panel data). The > dataset comprises beside company financials, the rating > category (e.g. AA, A, BBB) and the rating outlook > (positive, negative, stable) per year. > > I want to test for differences in company characteristics > between firms with a positive, negative, stable outlook. > However, the comparison has to be done within a rating > category. > > How can I replicate this in Stata? Sounds to me like a regression of a charateristic on rating and outlook. Rating would than be a control variable. You might consider adding interaction terms between rating and outlook. > Or more general, what is this type of problem called in > econometric terms? Just regression. Hope this helps, Maarten -------------------------- Maarten L. Buis Institut fuer Soziologie Universitaet Tuebingen Wilhelmstrasse 36 72074 Tuebingen Germany http://www.maartenbuis.nl--------------------------       * *   For searches and help try: *   http://www.stata.com/help.cgi?search*   http://www.stata.com/support/statalist/faq*   http://www.ats.ucla.edu/stat/stata/* *   For searches and help try: *   http://www.stata.com/help.cgi?search*   http://www.stata.com/support/statalist/faq*   http://www.ats.ucla.edu/stat/stata/