Quantcast

Chi-squared test for independence of observed and expected frequencies

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Chi-squared test for independence of observed and expected frequencies

Marc Michelsen
Dear all,

I am trying to copy the approach of Dittmar/Thakor (2007) "Why do firms
issue equity?" p. 27: The authors divide their sample of debt and equity
issuers into quartiles based on two explanatory variables, i.e. building a
matrix. Specifically, they examine the observed number of firms that fall
into one of the four categories and compare them to the expected
frequencies. After that, they apply a chi-squared test for independence to
determine if there are more or fewer firms than expected in each category.
Untabulated results show that each of these frequencies is significant.

I have managed to build the 4x3 matrix of observed and expected frequencies
using the user-written program ". tabchi [1. Dimension] [2. Dimension]". The
tabulated statistics include Pearson chi2(6) =  15.0080   Pr = 0.020 and
likelihood-ratio chi2(6) =  15.4736   Pr = 0.017. However, I struggle to
conduct this chi-squared test for independence to determine if there are
more or fewer firms than expected in each category.
I have tried user-written program ". chitesti" (part of the program
tab_chi), plugging into it the expected and observed frequencies. This gives
me Pearson chi2(11) =  15.0257   Pr =  0.181 and likelihood-ratio chi2(11) =
15.6908   Pr =  0.153. But this does not allow me to test the frequencies of
each (!) category.

What am I doing wrong? What is the correct and straightforward approach in
Stata for this type of problem?

Many thanks for considering this posting.

Regards
Marc


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Chi-squared test for independence of observed and expected frequencies

Maarten buis
--- On Thu, 15/7/10, Marc Michelsen wrote:
> I am trying to copy the approach of Dittmar/Thakor (2007)
> "Why do firms issue equity?"

Please report a complete reference.

> p. 27: The authors divide their sample of debt and equity
> issuers into quartiles based on two explanatory variables,

That is a horrible idea: You are throwing away huge amounts
of information. Just use your favourite regression like model
with your dependent variable your independent variables. If
you worry about functional form, use splines.

> I have managed to build the 4x3 matrix of observed and
> expected frequencies using the user-written program ".
> tabchi [1. Dimension] [2. Dimension]". However, I struggle
> to conduct this chi-squared test for independence to
> determine if there are more or fewer firms than expected
> in each category.

You are already done, the chi square test will only give
you this overall measure of whether your table deviates
from independence, it will not give you a cell by cell
test.

If you want to model patterns in your table you will
have to use what sociologists call loglinear models,
see (Hout 1983) for an introduction. You don't want to
go there unless you really need to. You don't need to,
since you should not do this anyhow, as you should not
waste the valuable information you have by only using
the quartiles.

Hope this helps,
Maarten

Mike Hout (1983) "Mobility Tables". Quantitative
Applications in the Social Sciences, nr. 31. Thousand
Oaks: Sage.


--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------


     

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Chi-squared test for independence of observed and expected frequencies

Stas Kolenikov
In reply to this post by Marc Michelsen
On Thu, Jul 15, 2010 at 10:33 AM, Marc Michelsen
<[hidden email]> wrote:
> I am trying to copy the approach of Dittmar/Thakor (2007) "Why do firms
> issue equity?" p. 27: The authors divide their sample of debt and equity
> issuers into quartiles based on two explanatory variables, i.e. building a
> matrix. Specifically, they examine the observed number of firms that fall
> into one of the four categories and compare them to the expected
> frequencies. After that, they apply a chi-squared test for independence to
> determine if there are more or fewer firms than expected in each category.
> Untabulated results show that each of these frequencies is significant.

I agree with Maarten: that's a strange approach. Not that it is
totally inappropriate... but it smells like 1960s when computations
were essentially restricted to how much handwriting you can fit onto
two sheets of paper. Propagating strange approaches does not do a good
service to whatever discipline you are in (finance?).

If those are continuous variables, you can use two-sample
Kolmogorov-Smirnov tests to compare the distributions. I am pretty
sure that bivariate versions of K-S tests exist, but they are not
implemented in Stata. If the explanatory variables are categorical,
you can compare the samples using -tabulate variable debt_vs_equity-
as they are.

If you want a fancier analysis, you can run -qreg- (or rather -sqreg-)
over a set of quantiles, with debt/equity as the explanatory
variables, to gauge whether the distributions of the continuous
variables are the same for two types of firms.

--
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

AW: st: Chi-squared test for independence of observed and expected frequencies

Marc Michelsen
Stas, Maarten,

many thanks for your comments.

The complete reference is: Dittmar, A., and A. Thakor. "Why do firms issue
equity?" Journal of Finance 62 (2007), 1-54.

You are totally right, the authors use this analysis only as an add-on /
robustness test. The main body of the paper are multivariate analyses.
Nevertheless, it would be quite helpful to determine the relative importance
of the two explanatory variables (dimensions), i.e. prior stock return
(divided into quartiles) and credit rating outlook (positive, negative,
stable). Do you have any idea how the authors have tested the significance
of each of the frequencies?

I will have a look at your three proposed alternatives and see how fancy
they are.

Regards
Marc



-----Ursprüngliche Nachricht-----
Von: [hidden email]
[mailto:[hidden email]] Im Auftrag von Stas Kolenikov
Gesendet: Donnerstag, 15. Juli 2010 23:52
An: [hidden email]
Betreff: Re: st: Chi-squared test for independence of observed and expected
frequencies

On Thu, Jul 15, 2010 at 10:33 AM, Marc Michelsen
<[hidden email]> wrote:
> I am trying to copy the approach of Dittmar/Thakor (2007) "Why do firms
> issue equity?" p. 27: The authors divide their sample of debt and equity
> issuers into quartiles based on two explanatory variables, i.e. building a
> matrix. Specifically, they examine the observed number of firms that fall
> into one of the four categories and compare them to the expected
> frequencies. After that, they apply a chi-squared test for independence to
> determine if there are more or fewer firms than expected in each category.
> Untabulated results show that each of these frequencies is significant.

I agree with Maarten: that's a strange approach. Not that it is
totally inappropriate... but it smells like 1960s when computations
were essentially restricted to how much handwriting you can fit onto
two sheets of paper. Propagating strange approaches does not do a good
service to whatever discipline you are in (finance?).

If those are continuous variables, you can use two-sample
Kolmogorov-Smirnov tests to compare the distributions. I am pretty
sure that bivariate versions of K-S tests exist, but they are not
implemented in Stata. If the explanatory variables are categorical,
you can compare the samples using -tabulate variable debt_vs_equity-
as they are.

If you want a fancier analysis, you can run -qreg- (or rather -sqreg-)
over a set of quantiles, with debt/equity as the explanatory
variables, to gauge whether the distributions of the continuous
variables are the same for two types of firms.

--
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: AW: st: Chi-squared test for independence of observed and expected frequencies

Maarten buis
--- On Fri, 16/7/10, Marc Michelsen wrote:
> Do you have any idea how the authors have tested
> the significance of each of the frequencies?

I don't even know what the null hypothesis should be:
independence refers to the whole set of frequencies
that make up a cross tabulation. A test on individual
frequencies then just does not make sense within
this context.

-- Maarten

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------

 


     

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: AW: st: Chi-squared test for independence of observed and expected frequencies

Maarten buis
In reply to this post by Marc Michelsen
--- On Fri, 16/7/10, Marc Michelsen wrote:
> The complete reference is: Dittmar, A., and A. Thakor. "Why
> do firms issue equity?" Journal of Finance 62 (2007), 1-54.

Ok, I had a chance to look at this article, but there is no
table on page 27, there is a reference to a table IV on the
next page. Are you refering to that? If that is the case
then that has absolutely nothing to do with a chi square
test of independence, it is just a collection of t-tests
comparing the averages of two groups on a set of variables.

Hope this helps,
Maarten

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------



     

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

AW: AW: st: Chi-squared test for independence of observed and expected frequencies

Marc Michelsen
Maarten,

many thanks for your efforts.

Indeed, the results for this analysis are untabulated. It just says in the text at the top of page 27 (re. Prediction 2): "Using a chi-squared test for independence to determine if there are more or fewer firms than expected in each category, we show that each of these frequenciesis significant."

Marc

-----Ursprüngliche Nachricht-----
Von: [hidden email] [mailto:[hidden email]] Im Auftrag von Maarten buis
Gesendet: Freitag, 16. Juli 2010 11:22
An: [hidden email]
Betreff: Re: AW: st: Chi-squared test for independence of observed and expected frequencies

--- On Fri, 16/7/10, Marc Michelsen wrote:
> The complete reference is: Dittmar, A., and A. Thakor. "Why
> do firms issue equity?" Journal of Finance 62 (2007), 1-54.

Ok, I had a chance to look at this article, but there is no
table on page 27, there is a reference to a table IV on the
next page. Are you refering to that? If that is the case
then that has absolutely nothing to do with a chi square
test of independence, it is just a collection of t-tests
comparing the averages of two groups on a set of variables.

Hope this helps,
Maarten

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------



     

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: AW: AW: st: Chi-squared test for independence of observed and expected frequencies

Maarten buis
--- On Fri, 16/7/10, Marc Michelsen wrote:
> Indeed, the results for this analysis are untabulated. It
> just says in the text at the top of page 27 (re. Prediction
> 2): "Using a chi-squared test for independence to determine
> if there are more or fewer firms than expected in each
> category, we show that each of these frequenciesis
> significant."

OK, I see. I would recommend to just forget about that test.
As I mentioned before a test on the individual frequencies
just does not make sense to me: independence is a
characteristic of the entire table not a characteristic of
individual frequencies. To do such a test right you'd have
to specify a specific hypothesis on the structure of counts,
and than do a log-linear model. This is just not worth the
effort, given that breaking up your continuous variable into
quartiles is a bad idea to begin with.

Hope this helps,
Maarten

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------


     

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

AW: AW: AW: st: Chi-squared test for independence of observed and expected frequencies

Marc Michelsen
Agree. Thanks.

Marc

-----Ursprüngliche Nachricht-----
Von: [hidden email] [mailto:[hidden email]] Im Auftrag von Maarten buis
Gesendet: Freitag, 16. Juli 2010 12:26
An: [hidden email]
Betreff: Re: AW: AW: st: Chi-squared test for independence of observed and expected frequencies

--- On Fri, 16/7/10, Marc Michelsen wrote:
> Indeed, the results for this analysis are untabulated. It
> just says in the text at the top of page 27 (re. Prediction
> 2): "Using a chi-squared test for independence to determine
> if there are more or fewer firms than expected in each
> category, we show that each of these frequenciesis
> significant."

OK, I see. I would recommend to just forget about that test.
As I mentioned before a test on the individual frequencies
just does not make sense to me: independence is a
characteristic of the entire table not a characteristic of
individual frequencies. To do such a test right you'd have
to specify a specific hypothesis on the structure of counts,
and than do a log-linear model. This is just not worth the
effort, given that breaking up your continuous variable into
quartiles is a bad idea to begin with.

Hope this helps,
Maarten

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------


     

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: AW: st: Chi-squared test for independence of observed and expected frequencies

Lachenbruch, Peter
In reply to this post by Maarten buis
However, one may want to test subtables when the overall hypothesis is one of homogeneity of various populations.  The second test is one of independence.  For a full table, the two tests are identical.  When one is looking at subtables one is in the multiple testing mode.  The way to do this is to look at the likelihood ratio chi-square and compare to the critical value for the full table (i.e. (r-1)(c-1) for the full table) even if one is looking at a 2x2 subtable
I don't have the exact reference, but it is fairly old -either something by Novick and Grizzle in JASA or Gabriel in Annals of STatistic.  It is before 1980 - if there's demand for this, i can look it up next week.

________________________________________
From: [hidden email] [[hidden email]] On Behalf Of Maarten buis [[hidden email]]
Sent: Friday, July 16, 2010 1:36 AM
To: [hidden email]
Subject: Re: AW: st: Chi-squared test for independence of observed and expected frequencies

--- On Fri, 16/7/10, Marc Michelsen wrote:
> Do you have any idea how the authors have tested
> the significance of each of the frequencies?

I don't even know what the null hypothesis should be:
independence refers to the whole set of frequencies
that make up a cross tabulation. A test on individual
frequencies then just does not make sense within
this context.

-- Maarten

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------






*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Chi-squared test for independence of observed and expected frequencies

Steven Samuels
This post was updated on .
In reply to this post by Marc Michelsen
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Fixed effects logit model

Marc Michelsen
In reply to this post by Marc Michelsen
Dear Statalist-users,

I am estimating a logit model for a panel style data set. In order to
guarantee unbiased estimation, I have used company, industry and/or offer
year clusters (per Petersen, 2009). For my linear regressions I have made
positive experience with fixed-effects models. Their application for binary
outcome models is not as straightforward because the models rely solely on
within-variance.

Running a fixed-effect logit model (-xtlogit, fe) shows highly significant
coefficients of my key variables, which would be very beneficial for my
study. However, more than 50% of my observations get lost in the regression
because of zero within variance.
Is it consistent to show also a fixed effects logit model beside standard
logit models clustered by the above mentioned characteristics. What do I
have to keep in mind when interpreting the results (especially relative to
the other ML models)? Is it possible to calculate marginal effects for such
a fixed effects model (similar to Cameron/Trivedi, 2009, p. 516?

Thank you for considering this posting.

Regards
Marc  

Cameron, A. C., and P. K. Trivedi. Microeconometrics using stata: Stata
Press (2009).
Petersen, M. A. "Estimating standard errors in finance panel data sets:
Comparing approaches." Review of Financial Studies 22 (2009), 435.



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

AW: st: Chi-squared test for independence of observed and expected frequencies

Marc Michelsen
In reply to this post by Steven Samuels
Thank you very much for all the valuable comments. Having read all that, I
will probably skip the analysis.

-----Ursprüngliche Nachricht-----
Von: [hidden email]
[mailto:[hidden email]] Im Auftrag von Steve Samuels
Gesendet: Samstag, 17. Juli 2010 00:24
An: [hidden email]
Betreff: Re: st: Chi-squared test for independence of observed and expected
frequencies

Marc Michelsen  wants to use the Chi Square test of independence in a
contingency table of two of his predictor variables, because the test
occurs in a reference. Stas  and Maarten suggested alternatives.  But
he wants to test the "significance of each of the frequencies".

This would have been ill-advised but possible.  See the section
"Nevertheless, it would be quite helpful to determine the relative
importance
of the two explanatory variables (dimensions), i.e. prior stock return
(divided into quartiles) and credit rating outlook (positive, negative,
stable)."

Agreed, but the chi square test for independence of the two
explanatory variables says  _nothing_ about their relative importance
as predictors.  The same logic applies to a test for the correlation
of two continuous predictors in ordinary regression.  Correlation
(multicollinearity)  will make it difficult to disentangle the effects
of the involved predictors, but it says nothing about the relative
importance of any of them

The authors of Marc's reference might have had other reasons for
studying the association of the two predictors.  They might have also
tested a single cell with the residual shown on page 81 of A. Agresti,
2002, "Categorical Data Analysis", Wiley Books.

Steve


On Fri, Jul 16, 2010 at 4:22 AM, Marc Michelsen
<[hidden email]> wrote:

> Stas, Maarten,
>
> many thanks for your comments.
>
> The complete reference is: Dittmar, A., and A. Thakor. "Why do firms issue
> equity?" Journal of Finance 62 (2007), 1-54.
>
> You are totally right, the authors use this analysis only as an add-on /
> robustness test. The main body of the paper are multivariate analyses.
> Nevertheless, it would be quite helpful to determine the relative
importance

> of the two explanatory variables (dimensions), i.e. prior stock return
> (divided into quartiles) and credit rating outlook (positive, negative,
> stable). Do you have any idea how the authors have tested the significance
> of each of the frequencies?
>
> I will have a look at your three proposed alternatives and see how fancy
> they are.
>
> Regards
> Marc
>
>
>
> -----Ursprüngliche Nachricht-----
> Von: [hidden email]
> [mailto:[hidden email]] Im Auftrag von Stas
Kolenikov
> Gesendet: Donnerstag, 15. Juli 2010 23:52
> An: [hidden email]
> Betreff: Re: st: Chi-squared test for independence of observed and
expected
> frequencies
>
> On Thu, Jul 15, 2010 at 10:33 AM, Marc Michelsen
> <[hidden email]> wrote:
>> I am trying to copy the approach of Dittmar/Thakor (2007) "Why do firms
>> issue equity?" p. 27: The authors divide their sample of debt and equity
>> issuers into quartiles based on two explanatory variables, i.e. building
a
>> matrix. Specifically, they examine the observed number of firms that fall
>> into one of the four categories and compare them to the expected
>> frequencies. After that, they apply a chi-squared test for independence
to
>> determine if there are more or fewer firms than expected in each
category.

>> Untabulated results show that each of these frequencies is significant.
>
> I agree with Maarten: that's a strange approach. Not that it is
> totally inappropriate... but it smells like 1960s when computations
> were essentially restricted to how much handwriting you can fit onto
> two sheets of paper. Propagating strange approaches does not do a good
> service to whatever discipline you are in (finance?).
>
> If those are continuous variables, you can use two-sample
> Kolmogorov-Smirnov tests to compare the distributions. I am pretty
> sure that bivariate versions of K-S tests exist, but they are not
> implemented in Stata. If the explanatory variables are categorical,
> you can compare the samples using -tabulate variable debt_vs_equity-
> as they are.
>
> If you want a fancier analysis, you can run -qreg- (or rather -sqreg-)
> over a set of quantiles, with debt/equity as the explanatory
> variables, to gauge whether the distributions of the continuous
> variables are the same for two types of firms.
>
> --
> Stas Kolenikov, also found at http://stas.kolenikov.name
> Small print: I use this email account for mailing lists only.
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>



--
Steven Samuels
[hidden email]
18 Cantine's Island
Saugerties NY 12477
USA
Voice: 845-246-0774
Fax:    206-202-4783

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Fixed effects logit model

Maarten buis
In reply to this post by Marc Michelsen
--- On Mon, 19/7/10, Marc Michelsen wrote:

> I am estimating a logit model for a panel style data set.
> In order to guarantee unbiased estimation, I have used company,
> industry and/or offer year clusters (per Petersen, 2009). For
> my linear regressions I have made positive experience with
> fixed-effects models. Their application for binary outcome
> models is not as straightforward because the models rely solely
> on within-variance.
>
> more than 50% of my observations get lost in the regression
> because of zero within variance.  Is it consistent to show also
> a fixed effects logit model beside standard logit models
> clustered by the above mentioned characteristics.

I would not do that, these two estimators just measure different
things, the fixed effects estimator controls for every
characteristic that remains constant, while your model with
clustered standard errors does not. I don't see how you can
compare the results of these two models. The point of presenting
two models side by side is that (it implies that) you can
compare models. If you can't compare those models, than
presenting the models side by side will just result in confusion.

The problem with a large proportion of dropped observations is
that you may need to think again about to what population you
are trying to generalize. For that reason I would look at
wether those that drop out of your analysis analysis are in
some sense different from those that are in the analysis in
terms of your observed variables. If you are lucky there isn't
much difference, and you can, with some arm waving, argue that
it doesn't matter. If there are considerable differences, than
I would just mention that, and at the very end of your paper
discuss some hypotheses of how this may influence your estimates.

Remember that you are trying to do something that is by
definition impossible: get an empricial estimate of an effect
while controlling for stuff that you haven't seen. So do not
expect to get the right answer. What you should aim at is to
look at your data as containing some information on the effect
that you are interested in; it is not enough, but it is not
zero either. There are now a variety of strategies you can
follow to extract that information. Pick one, and do that
one right. There are two reasons for that. First, using these
strategies right is hard (not surprising as they try to solve
an unsolvable problem...), so it really pays to focuss on one
strategy. Second, it is much easier this way to write your
paper in a way that it helps the reader to follow what data you
have used and what information it contains that help you get
an idea of what the effect of interest is (and what "information"
comes from the (untestable) assumptions underlying your strategy).
Others (or you in a different paper) can later use other
strategies. After a sufficient body of literature has been
assembled on this question, someone can try to summarize the
different finding.

Hope this helps,
Maarten

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------


     

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

AW: st: Fixed effects logit model

Marc Michelsen
Maarten,

thanks for these very thoughtful comments. I will think this through another time.

Marc

-----Ursprüngliche Nachricht-----
Von: [hidden email] [mailto:[hidden email]] Im Auftrag von Maarten buis
Gesendet: Montag, 19. Juli 2010 15:50
An: [hidden email]
Betreff: Re: st: Fixed effects logit model

--- On Mon, 19/7/10, Marc Michelsen wrote:

> I am estimating a logit model for a panel style data set.
> In order to guarantee unbiased estimation, I have used company,
> industry and/or offer year clusters (per Petersen, 2009). For
> my linear regressions I have made positive experience with
> fixed-effects models. Their application for binary outcome
> models is not as straightforward because the models rely solely
> on within-variance.
>
> more than 50% of my observations get lost in the regression
> because of zero within variance.  Is it consistent to show also
> a fixed effects logit model beside standard logit models
> clustered by the above mentioned characteristics.

I would not do that, these two estimators just measure different
things, the fixed effects estimator controls for every
characteristic that remains constant, while your model with
clustered standard errors does not. I don't see how you can
compare the results of these two models. The point of presenting
two models side by side is that (it implies that) you can
compare models. If you can't compare those models, than
presenting the models side by side will just result in confusion.

The problem with a large proportion of dropped observations is
that you may need to think again about to what population you
are trying to generalize. For that reason I would look at
wether those that drop out of your analysis analysis are in
some sense different from those that are in the analysis in
terms of your observed variables. If you are lucky there isn't
much difference, and you can, with some arm waving, argue that
it doesn't matter. If there are considerable differences, than
I would just mention that, and at the very end of your paper
discuss some hypotheses of how this may influence your estimates.

Remember that you are trying to do something that is by
definition impossible: get an empricial estimate of an effect
while controlling for stuff that you haven't seen. So do not
expect to get the right answer. What you should aim at is to
look at your data as containing some information on the effect
that you are interested in; it is not enough, but it is not
zero either. There are now a variety of strategies you can
follow to extract that information. Pick one, and do that
one right. There are two reasons for that. First, using these
strategies right is hard (not surprising as they try to solve
an unsolvable problem...), so it really pays to focuss on one
strategy. Second, it is much easier this way to write your
paper in a way that it helps the reader to follow what data you
have used and what information it contains that help you get
an idea of what the effect of interest is (and what "information"
comes from the (untestable) assumptions underlying your strategy).
Others (or you in a different paper) can later use other
strategies. After a sufficient body of literature has been
assembled on this question, someone can try to summarize the
different finding.

Hope this helps,
Maarten

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------


     

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Fixed effects logit model

Abhimanyu Arora-2
In reply to this post by Maarten buis
Thanks Maarten, for a very useful and practical response.

On Mon, Jul 19, 2010 at 3:50 PM, Maarten buis <[hidden email]> wrote:

> --- On Mon, 19/7/10, Marc Michelsen wrote:
>> I am estimating a logit model for a panel style data set.
>> In order to guarantee unbiased estimation, I have used company,
>> industry and/or offer year clusters (per Petersen, 2009). For
>> my linear regressions I have made positive experience with
>> fixed-effects models. Their application for binary outcome
>> models is not as straightforward because the models rely solely
>> on within-variance.
>>
>> more than 50% of my observations get lost in the regression
>> because of zero within variance.  Is it consistent to show also
>> a fixed effects logit model beside standard logit models
>> clustered by the above mentioned characteristics.
>
> I would not do that, these two estimators just measure different
> things, the fixed effects estimator controls for every
> characteristic that remains constant, while your model with
> clustered standard errors does not. I don't see how you can
> compare the results of these two models. The point of presenting
> two models side by side is that (it implies that) you can
> compare models. If you can't compare those models, than
> presenting the models side by side will just result in confusion.
>
> The problem with a large proportion of dropped observations is
> that you may need to think again about to what population you
> are trying to generalize. For that reason I would look at
> wether those that drop out of your analysis analysis are in
> some sense different from those that are in the analysis in
> terms of your observed variables. If you are lucky there isn't
> much difference, and you can, with some arm waving, argue that
> it doesn't matter. If there are considerable differences, than
> I would just mention that, and at the very end of your paper
> discuss some hypotheses of how this may influence your estimates.
>
> Remember that you are trying to do something that is by
> definition impossible: get an empricial estimate of an effect
> while controlling for stuff that you haven't seen. So do not
> expect to get the right answer. What you should aim at is to
> look at your data as containing some information on the effect
> that you are interested in; it is not enough, but it is not
> zero either. There are now a variety of strategies you can
> follow to extract that information. Pick one, and do that
> one right. There are two reasons for that. First, using these
> strategies right is hard (not surprising as they try to solve
> an unsolvable problem...), so it really pays to focuss on one
> strategy. Second, it is much easier this way to write your
> paper in a way that it helps the reader to follow what data you
> have used and what information it contains that help you get
> an idea of what the effect of interest is (and what "information"
> comes from the (untestable) assumptions underlying your strategy).
> Others (or you in a different paper) can later use other
> strategies. After a sufficient body of literature has been
> assembled on this question, someone can try to summarize the
> different finding.
>
> Hope this helps,
> Maarten
>
> --------------------------
> Maarten L. Buis
> Institut fuer Soziologie
> Universitaet Tuebingen
> Wilhelmstrasse 36
> 72074 Tuebingen
> Germany
>
> http://www.maartenbuis.nl
> --------------------------
>
>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Testing for differences

Marc Michelsen
In reply to this post by Marc Michelsen
Dear Statalist-Users,

I looking for the appropriate statistic to test for differences in a rated
company sample (panel data). The dataset comprises beside company
financials, the rating category (e.g. AA, A, BBB) and the rating outlook
(positive, negative, stable) per year. In summary, I've got firm-years per
rating category that are subdivided by the prevailing rating outlook.

I want to test for differences in company characteristics between firms with
a positive, negative, stable outlook. However, the comparison has to be done
within a rating category. So BBB companies with a negative outlook have to
be compared with companies also rated BBB but with a different outlook. But
I want to test the whole sample at once and not just sub-samples.

How can I replicate this in Stata? Or more general, what is this type of
problem called in econometric terms? "Clustered"?

Thanks for considering my post.

Marc

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Testing for differences

Maarten buis
--- On Thu, 28/10/10, Marc Michelsen wrote:

> I looking for the appropriate statistic to test for
> differences in a rated company sample (panel data). The
> dataset comprises beside company financials, the rating
> category (e.g. AA, A, BBB) and the rating outlook
> (positive, negative, stable) per year.
>
> I want to test for differences in company characteristics
> between firms with a positive, negative, stable outlook.
> However, the comparison has to be done within a rating
> category.
>
> How can I replicate this in Stata?

Sounds to me like a regression of a charateristic on rating
and outlook. Rating would than be a control variable. You
might consider adding interaction terms between rating and
outlook.

> Or more general, what is this type of problem called in
> econometric terms?

Just regression.

Hope this helps,
Maarten

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------


     

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

AW: st: Testing for differences

Marc Michelsen
Maarten,

thanks for this. Would that be a multinomial logistic regression as the rating outlook has the three values "positive, negative, stable"?

Regards
Marc

-----Ursprüngliche Nachricht-----
Von: [hidden email] [mailto:[hidden email]] Im Auftrag von Maarten buis
Gesendet: Donnerstag, 28. Oktober 2010 15:25
An: [hidden email]
Betreff: Re: st: Testing for differences

--- On Thu, 28/10/10, Marc Michelsen wrote:

> I looking for the appropriate statistic to test for
> differences in a rated company sample (panel data). The
> dataset comprises beside company financials, the rating
> category (e.g. AA, A, BBB) and the rating outlook
> (positive, negative, stable) per year.
>
> I want to test for differences in company characteristics
> between firms with a positive, negative, stable outlook.
> However, the comparison has to be done within a rating
> category.
>
> How can I replicate this in Stata?

Sounds to me like a regression of a charateristic on rating
and outlook. Rating would than be a control variable. You
might consider adding interaction terms between rating and
outlook.

> Or more general, what is this type of problem called in
> econometric terms?

Just regression.

Hope this helps,
Maarten

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------


     

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: AW: st: Testing for differences

Maarten buis
--- On Thu, 28/10/10, Marc Michelsen wrote:
> thanks for this. Would that be a multinomial logistic
> regression as the rating outlook has the three values
> "positive, negative, stable"?

No, the rating and the outlook are both explanatory/
independent/predictor/right-hand-side/x-variables.
The characteristic (whatever that may be) is your
explained/dependent/left-hand-side/y-variable. So
the type of regression depends on the type of
firm characteristic you want to investigate.

Hope this helps,
Maarten

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------


     

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
12
Loading...