PSMATCH with 2 conditions

8 messages
Open this post in threaded view
|
Report Content as Inappropriate

PSMATCH with 2 conditions

 PSMATCH only IF Dear Statalist users, I am writing a study on the performance of Private Equity (PE) vs. Non-Private Equity (NPE) backed IPOs. In order to eliminate the endogenity of being PE-backed, I want to perform propensity score matching by applying both local linear regression and k-nearest neighbors methods. My depvar is PE_backed My [indepvars] are balance sheet and income statement figures like sales, total assets, operating income, etc. My outcome is the buy-and-hold-return 4 years after the IPO In my dataset I created a variable which takes values from 1 to 15, each number corresponding to the industry the PE- or NPE-backed firm belongs to. I also have a variable which indicates the year when the IPO occurred. Now my question: I want to match firms with each other, ONLY if they belong to the same industry and if the IPO occurred in the same YEAR. In other words, match firm A (PE-backed) with the firm B (NPE-backed) for which the distance between their propensity scores is the smallest AND they belong to the same industry and their IPOs occurred in the same year. How can I compute this with psmatch2? Thanks Mihai * *   For searches and help try: *   http://www.stata.com/help.cgi?search*   http://www.stata.com/support/statalist/faq*   http://www.ats.ucla.edu/stat/stata/
Open this post in threaded view
|
Report Content as Inappropriate

re: PSMATCH with 2 conditions

 See the following post for an example of forcing a match: http://www.stata.com/statalist/archive/2010-09/msg00073.htmlIn your case, say you have a 4-digit industry code (e.g. SIC). You can first compute propensity scores, then make a new variable pscore2 that looks like: pscore2 = year*100000+industry*10+pscore If you have year 1999, industry 1234 and p-score 0.25, you get: pscore2 = 199912340.25 If you put a caliper of, say, 0.5, it is impossible to match to a firm in a different industry/year. * *   For searches and help try: *   http://www.stata.com/help.cgi?search*   http://www.stata.com/support/statalist/faq*   http://www.ats.ucla.edu/stat/stata/
Open this post in threaded view
|
Report Content as Inappropriate

re: PSMATCH with 2 conditions

 In reply to this post by Mihai-Andrei Popescu-Greaca I would suggest another path to take here. cem (Coarsened Exact Matching)is user written program by Gary King (http://gking.harvard.edu/cem/) that allows you to force a match on a specific level of a variable (for example, force gender to be exact). In this case, I would match some variables within calipers and require exact matches where necessary on other variables. I hope this helps Ariel Date: Tue, 7 Sep 2010 11:07:37 -0700 From: "Caskey, Judson" <[hidden email]> Subject: re: st: PSMATCH with 2 conditions See the following post for an example of forcing a match: http://www.stata.com/statalist/archive/2010-09/msg00073.htmlIn your case, say you have a 4-digit industry code (e.g. SIC). You can first compute propensity scores, then make a new variable pscore2 that looks like: pscore2 = year*100000+industry*10+pscore If you have year 1999, industry 1234 and p-score 0.25, you get: pscore2 = 199912340.25 If you put a caliper of, say, 0.5, it is impossible to match to a firm in a different industry/year. * *   For searches and help try: *   http://www.stata.com/help.cgi?search*   http://www.stata.com/support/statalist/faq*   http://www.ats.ucla.edu/stat/stata/
Open this post in threaded view
|
Report Content as Inappropriate

AW: st: PSMATCH with 2 conditions

 In reply to this post by Caskey, Judson Hi Judson, thanks for the great tip & sorry for the late reply but I've been pretty busy lately. One thing though: My industry is only 2 digit, so if I only want to match by industry, I just multiply industry by 10 and then add the pscore, as follows: gen pscore2=3-digit-industry*10+pscore And then psmatch2 as indicated by you in the link. THE PROBLEM occurred when I used 1000 instead of 10, and then 100000 instead of 1000; ALL 3 yielded different results (T & Z-scores when bootstrapping SEs) Do you have an explanation for the different results?? Regards, Mihai -----Ursprüngliche Nachricht----- Von: [hidden email] [mailto:[hidden email]] Im Auftrag von Caskey, Judson Gesendet: Dienstag, 7. September 2010 20:08 An: [hidden email] Betreff: re: st: PSMATCH with 2 conditions See the following post for an example of forcing a match: http://www.stata.com/statalist/archive/2010-09/msg00073.htmlIn your case, say you have a 4-digit industry code (e.g. SIC). You can first compute propensity scores, then make a new variable pscore2 that looks like: pscore2 = year*100000+industry*10+pscore If you have year 1999, industry 1234 and p-score 0.25, you get: pscore2 = 199912340.25 If you put a caliper of, say, 0.5, it is impossible to match to a firm in a different industry/year. * *   For searches and help try: *   http://www.stata.com/help.cgi?search*   http://www.stata.com/support/statalist/faq*   http://www.ats.ucla.edu/stat/stata/* *   For searches and help try: *   http://www.stata.com/help.cgi?search*   http://www.stata.com/support/statalist/faq*   http://www.ats.ucla.edu/stat/stata/
Open this post in threaded view
|
Report Content as Inappropriate

re: AW: st: PSMATCH with 2 conditions

 Mihai, I was unable to replicate the problem. Can you try replicating it on one of the public datasets and send an example? Here is what I tried: webuse nlswork logit union collgrad age tenure not_smsa c_city south nev_mar predict pscore if e(sample), pr gen pscore2=year*10+pscore gen pscore3=year*1000+pscore psmatch2 union, pscore(pscore2) outcome(ln_wage) caliper(0.5) psmatch2 union, pscore(pscore3) outcome(ln_wage) caliper(0.5) I get the same results in both of the psmatch2 calls. The issue may be with the bootstrapping procedure rather than with the call to psmatch2, itself. Regards, Judson ---------------------------------- Hi Judson, thanks for the great tip & sorry for the late reply but I've been pretty busy lately. One thing though: My industry is only 2 digit, so if I only want to match by industry, I just multiply industry by 10 and then add the pscore, as follows: gen pscore2=3-digit-industry*10+pscore And then psmatch2 as indicated by you in the link. THE PROBLEM occurred when I used 1000 instead of 10, and then 100000 instead of 1000; ALL 3 yielded different results (T & Z-scores when bootstrapping SEs) Do you have an explanation for the different results?? Regards, Mihai * *   For searches and help try: *   http://www.stata.com/help.cgi?search*   http://www.stata.com/support/statalist/faq*   http://www.ats.ucla.edu/stat/stata/
Open this post in threaded view
|
Report Content as Inappropriate

AW: AW: st: PSMATCH with 2 conditions

 Dear Judson, Just tried out the multiplication with 10 and 1000 on the nlswork dataset, and again, the T-values are different (without bootstrapping). When I use pscore2 (multiplication with 10) I get a T of 9.16, Difference is .102809747 and SE is .011229545. Using pscore3 (*1000) yields a T of 4.02, a Difference of .092227641 and a SE of .02294341. The only explanation I came up with is that (my) Stata "looses" decimals with each multiplication or performs a "mysterious" round-up. I.e. if my initial pscore is 0.28732124 (like row 3 in the dataset), then the pscore2 is 720.28729 and the pscore3 is 72000.287. IF my theory holds, then by multiplying with a huge number, all decimals should disappear. I generated a pscore4 (*1000000) and now I only see 72000000, but T is still 4.72, Difference is .263281083 and SE .055774089, hence this doesn't make sense either. I'm using Stata 11.1..I really don't know what else can be wrong..:((((( PS: I exported the data to excel, and only the "short" numbers get exported (i.e. 72000.287, and NOT 72*1000+0.287321424) Best, Mihai -----Ursprüngliche Nachricht----- Von: [hidden email] [mailto:[hidden email]] Im Auftrag von Caskey, Judson Gesendet: Montag, 20. September 2010 17:48 An: [hidden email] Betreff: re: AW: st: PSMATCH with 2 conditions Mihai, I was unable to replicate the problem. Can you try replicating it on one of the public datasets and send an example? Here is what I tried: webuse nlswork logit union collgrad age tenure not_smsa c_city south nev_mar predict pscore if e(sample), pr gen pscore2=year*10+pscore gen pscore3=year*1000+pscore psmatch2 union, pscore(pscore2) outcome(ln_wage) caliper(0.5) psmatch2 union, pscore(pscore3) outcome(ln_wage) caliper(0.5) I get the same results in both of the psmatch2 calls. The issue may be with the bootstrapping procedure rather than with the call to psmatch2, itself. Regards, Judson ---------------------------------- Hi Judson, thanks for the great tip & sorry for the late reply but I've been pretty busy lately. One thing though: My industry is only 2 digit, so if I only want to match by industry, I just multiply industry by 10 and then add the pscore, as follows: gen pscore2=3-digit-industry*10+pscore And then psmatch2 as indicated by you in the link. THE PROBLEM occurred when I used 1000 instead of 10, and then 100000 instead of 1000; ALL 3 yielded different results (T & Z-scores when bootstrapping SEs) Do you have an explanation for the different results?? Regards, Mihai * *   For searches and help try: *   http://www.stata.com/help.cgi?search*   http://www.stata.com/support/statalist/faq*   http://www.ats.ucla.edu/stat/stata/* *   For searches and help try: *   http://www.stata.com/help.cgi?search*   http://www.stata.com/support/statalist/faq*   http://www.ats.ucla.edu/stat/stata/
Open this post in threaded view
|
Report Content as Inappropriate

re: AW: AW: st: PSMATCH with 2 conditions

 In reply to this post by Mihai-Andrei Popescu-Greaca Try using "gen double" for the modified p-scores: . webuse nlswork (National Longitudinal Survey.  Young Women 14-26 years of age in 1968) . logit union collgrad age tenure not_smsa c_city south nev_mar Iteration 0:   log likelihood = -10360.082   Iteration 1:   log likelihood = -9886.8672   Iteration 2:   log likelihood = -9876.1757   Iteration 3:   log likelihood = -9876.1691   Iteration 4:   log likelihood = -9876.1691   Logistic regression                               Number of obs   =      18997                                                   LR chi2(7)      =     967.83                                                   Prob > chi2     =     0.0000 Log likelihood = -9876.1691                       Pseudo R2       =     0.0467 ------------------------------------------------------------------------------        union |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval] -------------+----------------------------------------------------------------     collgrad |   .3103226   .0424878     7.30   0.000     .2270481     .393597          age |  -.0129597   .0032825    -3.95   0.000    -.0193933   -.0065262       tenure |   .0889952   .0043032    20.68   0.000     .0805611    .0974293     not_smsa |  -.0982961   .0466506    -2.11   0.035    -.1897296   -.0068625       c_city |   .3455925   .0410881     8.41   0.000     .2650613    .4261236        south |  -.6378885   .0380495   -16.76   0.000    -.7124641   -.5633128      nev_mar |  -.0425097   .0461649    -0.92   0.357    -.1329912    .0479719        _cons |  -1.076406   .1039223   -10.36   0.000     -1.28009   -.8727223 ------------------------------------------------------------------------------ . predict pscore if e(sample), pr (9537 missing values generated) . gen double pscore2=year*10+pscore (9537 missing values generated) . gen double pscore3=year*1000+pscore (9537 missing values generated) . psmatch2 union, pscore(pscore2) outcome(ln_wage) caliper(0.5) There are observations with identical propensity score values. The sort order of the data could affect your results. Make sure that the sort order is random before calling psmatch2. (9537 missing values generated) ----------------------------------------------------------------------------------------         Variable     Sample |    Treated     Controls   Difference         S.E.   T-stat ----------------------------+-----------------------------------------------------------          ln_wage  Unmatched | 1.92862097   1.70388928    .22473169   .007824215    28.72                         ATT | 1.92862097   1.82488041   .103740566   .011091593     9.35 ----------------------------+----------------------------------------------------------- Note: S.E. for ATT does not take into account that the propensity score is estimated.            | psmatch2:  psmatch2: |   Common  Treatment |  support assignment | On suppor |     Total -----------+-----------+----------  Untreated |    14,531 |    14,531    Treated |     4,466 |     4,466 -----------+-----------+----------      Total |    18,997 |    18,997 . psmatch2 union, pscore(pscore3) outcome(ln_wage) caliper(0.5) There are observations with identical propensity score values. The sort order of the data could affect your results. Make sure that the sort order is random before calling psmatch2. (9537 missing values generated) ----------------------------------------------------------------------------------------         Variable     Sample |    Treated     Controls   Difference         S.E.   T-stat ----------------------------+-----------------------------------------------------------          ln_wage  Unmatched | 1.92862097   1.70388928    .22473169   .007824215    28.72                         ATT | 1.92862097   1.82488041   .103740566   .011091593     9.35 ----------------------------+----------------------------------------------------------- Note: S.E. for ATT does not take into account that the propensity score is estimated.            | psmatch2:  psmatch2: |   Common  Treatment |  support assignment | On suppor |     Total -----------+-----------+----------  Untreated |    14,531 |    14,531    Treated |     4,466 |     4,466 -----------+-----------+----------      Total |    18,997 |    18,997 Regards, Judson Caskey UCLA Anderson School of Management 110 Westwood Plaza, D416 Los Angeles, CA  90095 Office:                  (310)206-1503 Mobile:                (310)775-0080 [hidden email] http://www.anderson.ucla.edu/x15538.xml* *   For searches and help try: *   http://www.stata.com/help.cgi?search*   http://www.stata.com/support/statalist/faq*   http://www.ats.ucla.edu/stat/stata/