I am analyzing survival times with no time-varying co-variates. At the moment, I am using a Cox proportional hazards model based on STATA's stcox.
The data is characterized as follows: For all observations in the sample it is known when an individual joined the risk pool, i.e., starting dates are known for all observations. Basically, spells can be terminated by two different outcomes A and B. Unfortunately, I only observe one of those two outcomes, A. For those cases, I also know when A happened and I can compute the duration of spells ending in A as (date of A minus entry date). For the remaining observations it is impossible to determine whether the spell already was terminated by event B or whether the observation is still at risk. Due to this data structure it seems unreasonable to treat observations that didn't end in A as censored observations as I cannot know whether they are still in the risk pool (here duration would be date today minus entry date) or whether they left the risk pool to destination B (then duration would be date of B minus entry date). Currently, I am estimating the Cox model only for observations that ended in A excluding all other observations from the estimation. As a robustness check, I also estimate a Heckman selection model where the selection is defined over (spell ended in A yes/no) and duration is the dependent variable in stage 2. Results of both exercises are comparable. Is anyone aware of how to deal with this problem in a better way? Or some literature looking at potential biases from excluding observations with unknown spell-endings? Thanks for your support! Stefan ************************************************************************************** Stefan Wagner INNO-tec Institut für Innovationsforschung, Technologiemanagement und Entrepreneurship Ludwig-Maximilians-Universität München Kaulbachstr. 45/III 80539 München Tel.: ++49/89/2180-2877 Fax: ++49/89/2180-6284 [hidden email] http://www.inno-tec.de/personen/mitarbeiter/wagner/index.html
Dear Mr. Wagner,
All dates in my data is known (1988-2008). However, there are no failures in the period of analysis for some firms, while other firms have just one observation and fail. My doubt is the time duration of the observations that do not fail. In Stata, firms are without any value in the cases there are no failure. However when I run stset time, failure(dep.var) id(idfirm), the message "PROBABLE ERROR" appears. As my goal is to study the second entry in the market, for firms that have just one observation and did not fail NO value is related to them, while firms that have a failure in the first event has the value "zero". And if the second event is a failure it has value "1". Please, should I run other commands to avoid probable errors? Am I using the right values for the circumstances that I mentioned? Thank you very much. Leo Quadros Universidad Autonoma de Barcelona |
