PSMATCH only IF
Dear Statalist users, I am writing a study on the performance of Private Equity (PE) vs. Non-Private Equity (NPE) backed IPOs. In order to eliminate the endogenity of being PE-backed, I want to perform propensity score matching by applying both local linear regression and k-nearest neighbors methods. My depvar is PE_backed My [indepvars] are balance sheet and income statement figures like sales, total assets, operating income, etc. My outcome is the buy-and-hold-return 4 years after the IPO In my dataset I created a variable which takes values from 1 to 15, each number corresponding to the industry the PE- or NPE-backed firm belongs to. I also have a variable which indicates the year when the IPO occurred. Now my question: I want to match firms with each other, ONLY if they belong to the same industry and if the IPO occurred in the same YEAR. In other words, match firm A (PE-backed) with the firm B (NPE-backed) for which the distance between their propensity scores is the smallest AND they belong to the same industry and their IPOs occurred in the same year. How can I compute this with psmatch2? Thanks Mihai * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
See the following post for an example of forcing a match:
http://www.stata.com/statalist/archive/2010-09/msg00073.html In your case, say you have a 4-digit industry code (e.g. SIC). You can first compute propensity scores, then make a new variable pscore2 that looks like: pscore2 = year*100000+industry*10+pscore If you have year 1999, industry 1234 and p-score 0.25, you get: pscore2 = 199912340.25 If you put a caliper of, say, 0.5, it is impossible to match to a firm in a different industry/year. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
In reply to this post by Mihai-Andrei Popescu-Greaca
I would suggest another path to take here. cem (Coarsened Exact Matching)is
user written program by Gary King (http://gking.harvard.edu/cem/) that allows you to force a match on a specific level of a variable (for example, force gender to be exact). In this case, I would match some variables within calipers and require exact matches where necessary on other variables. I hope this helps Ariel Date: Tue, 7 Sep 2010 11:07:37 -0700 From: "Caskey, Judson" <[hidden email]> Subject: re: st: PSMATCH with 2 conditions See the following post for an example of forcing a match: http://www.stata.com/statalist/archive/2010-09/msg00073.html In your case, say you have a 4-digit industry code (e.g. SIC). You can first compute propensity scores, then make a new variable pscore2 that looks like: pscore2 = year*100000+industry*10+pscore If you have year 1999, industry 1234 and p-score 0.25, you get: pscore2 = 199912340.25 If you put a caliper of, say, 0.5, it is impossible to match to a firm in a different industry/year. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
In reply to this post by Caskey, Judson
Hi Judson,
thanks for the great tip & sorry for the late reply but I've been pretty busy lately. One thing though: My industry is only 2 digit, so if I only want to match by industry, I just multiply industry by 10 and then add the pscore, as follows: gen pscore2=3-digit-industry*10+pscore And then psmatch2 as indicated by you in the link. THE PROBLEM occurred when I used 1000 instead of 10, and then 100000 instead of 1000; ALL 3 yielded different results (T & Z-scores when bootstrapping SEs) Do you have an explanation for the different results?? Regards, Mihai -----Ursprüngliche Nachricht----- Von: [hidden email] [mailto:[hidden email]] Im Auftrag von Caskey, Judson Gesendet: Dienstag, 7. September 2010 20:08 An: [hidden email] Betreff: re: st: PSMATCH with 2 conditions See the following post for an example of forcing a match: http://www.stata.com/statalist/archive/2010-09/msg00073.html In your case, say you have a 4-digit industry code (e.g. SIC). You can first compute propensity scores, then make a new variable pscore2 that looks like: pscore2 = year*100000+industry*10+pscore If you have year 1999, industry 1234 and p-score 0.25, you get: pscore2 = 199912340.25 If you put a caliper of, say, 0.5, it is impossible to match to a firm in a different industry/year. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
Mihai,
I was unable to replicate the problem. Can you try replicating it on one of the public datasets and send an example? Here is what I tried: webuse nlswork logit union collgrad age tenure not_smsa c_city south nev_mar predict pscore if e(sample), pr gen pscore2=year*10+pscore gen pscore3=year*1000+pscore psmatch2 union, pscore(pscore2) outcome(ln_wage) caliper(0.5) psmatch2 union, pscore(pscore3) outcome(ln_wage) caliper(0.5) I get the same results in both of the psmatch2 calls. The issue may be with the bootstrapping procedure rather than with the call to psmatch2, itself. Regards, Judson ---------------------------------- Hi Judson, thanks for the great tip & sorry for the late reply but I've been pretty busy lately. One thing though: My industry is only 2 digit, so if I only want to match by industry, I just multiply industry by 10 and then add the pscore, as follows: gen pscore2=3-digit-industry*10+pscore And then psmatch2 as indicated by you in the link. THE PROBLEM occurred when I used 1000 instead of 10, and then 100000 instead of 1000; ALL 3 yielded different results (T & Z-scores when bootstrapping SEs) Do you have an explanation for the different results?? Regards, Mihai * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
Dear Judson,
Just tried out the multiplication with 10 and 1000 on the nlswork dataset, and again, the T-values are different (without bootstrapping). When I use pscore2 (multiplication with 10) I get a T of 9.16, Difference is .102809747 and SE is .011229545. Using pscore3 (*1000) yields a T of 4.02, a Difference of .092227641 and a SE of .02294341. The only explanation I came up with is that (my) Stata "looses" decimals with each multiplication or performs a "mysterious" round-up. I.e. if my initial pscore is 0.28732124 (like row 3 in the dataset), then the pscore2 is 720.28729 and the pscore3 is 72000.287. IF my theory holds, then by multiplying with a huge number, all decimals should disappear. I generated a pscore4 (*1000000) and now I only see 72000000, but T is still 4.72, Difference is .263281083 and SE .055774089, hence this doesn't make sense either. I'm using Stata 11.1..I really don't know what else can be wrong..:((((( PS: I exported the data to excel, and only the "short" numbers get exported (i.e. 72000.287, and NOT 72*1000+0.287321424) Best, Mihai -----Ursprüngliche Nachricht----- Von: [hidden email] [mailto:[hidden email]] Im Auftrag von Caskey, Judson Gesendet: Montag, 20. September 2010 17:48 An: [hidden email] Betreff: re: AW: st: PSMATCH with 2 conditions Mihai, I was unable to replicate the problem. Can you try replicating it on one of the public datasets and send an example? Here is what I tried: webuse nlswork logit union collgrad age tenure not_smsa c_city south nev_mar predict pscore if e(sample), pr gen pscore2=year*10+pscore gen pscore3=year*1000+pscore psmatch2 union, pscore(pscore2) outcome(ln_wage) caliper(0.5) psmatch2 union, pscore(pscore3) outcome(ln_wage) caliper(0.5) I get the same results in both of the psmatch2 calls. The issue may be with the bootstrapping procedure rather than with the call to psmatch2, itself. Regards, Judson ---------------------------------- Hi Judson, thanks for the great tip & sorry for the late reply but I've been pretty busy lately. One thing though: My industry is only 2 digit, so if I only want to match by industry, I just multiply industry by 10 and then add the pscore, as follows: gen pscore2=3-digit-industry*10+pscore And then psmatch2 as indicated by you in the link. THE PROBLEM occurred when I used 1000 instead of 10, and then 100000 instead of 1000; ALL 3 yielded different results (T & Z-scores when bootstrapping SEs) Do you have an explanation for the different results?? Regards, Mihai * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
In reply to this post by Mihai-Andrei Popescu-Greaca
Try using "gen double" for the modified p-scores:
. webuse nlswork (National Longitudinal Survey. Young Women 14-26 years of age in 1968) . logit union collgrad age tenure not_smsa c_city south nev_mar Iteration 0: log likelihood = -10360.082 Iteration 1: log likelihood = -9886.8672 Iteration 2: log likelihood = -9876.1757 Iteration 3: log likelihood = -9876.1691 Iteration 4: log likelihood = -9876.1691 Logistic regression Number of obs = 18997 LR chi2(7) = 967.83 Prob > chi2 = 0.0000 Log likelihood = -9876.1691 Pseudo R2 = 0.0467 ------------------------------------------------------------------------------ union | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- collgrad | .3103226 .0424878 7.30 0.000 .2270481 .393597 age | -.0129597 .0032825 -3.95 0.000 -.0193933 -.0065262 tenure | .0889952 .0043032 20.68 0.000 .0805611 .0974293 not_smsa | -.0982961 .0466506 -2.11 0.035 -.1897296 -.0068625 c_city | .3455925 .0410881 8.41 0.000 .2650613 .4261236 south | -.6378885 .0380495 -16.76 0.000 -.7124641 -.5633128 nev_mar | -.0425097 .0461649 -0.92 0.357 -.1329912 .0479719 _cons | -1.076406 .1039223 -10.36 0.000 -1.28009 -.8727223 ------------------------------------------------------------------------------ . predict pscore if e(sample), pr (9537 missing values generated) . gen double pscore2=year*10+pscore (9537 missing values generated) . gen double pscore3=year*1000+pscore (9537 missing values generated) . psmatch2 union, pscore(pscore2) outcome(ln_wage) caliper(0.5) There are observations with identical propensity score values. The sort order of the data could affect your results. Make sure that the sort order is random before calling psmatch2. (9537 missing values generated) ---------------------------------------------------------------------------------------- Variable Sample | Treated Controls Difference S.E. T-stat ----------------------------+----------------------------------------------------------- ln_wage Unmatched | 1.92862097 1.70388928 .22473169 .007824215 28.72 ATT | 1.92862097 1.82488041 .103740566 .011091593 9.35 ----------------------------+----------------------------------------------------------- Note: S.E. for ATT does not take into account that the propensity score is estimated. | psmatch2: psmatch2: | Common Treatment | support assignment | On suppor | Total -----------+-----------+---------- Untreated | 14,531 | 14,531 Treated | 4,466 | 4,466 -----------+-----------+---------- Total | 18,997 | 18,997 . psmatch2 union, pscore(pscore3) outcome(ln_wage) caliper(0.5) There are observations with identical propensity score values. The sort order of the data could affect your results. Make sure that the sort order is random before calling psmatch2. (9537 missing values generated) ---------------------------------------------------------------------------------------- Variable Sample | Treated Controls Difference S.E. T-stat ----------------------------+----------------------------------------------------------- ln_wage Unmatched | 1.92862097 1.70388928 .22473169 .007824215 28.72 ATT | 1.92862097 1.82488041 .103740566 .011091593 9.35 ----------------------------+----------------------------------------------------------- Note: S.E. for ATT does not take into account that the propensity score is estimated. | psmatch2: psmatch2: | Common Treatment | support assignment | On suppor | Total -----------+-----------+---------- Untreated | 14,531 | 14,531 Treated | 4,466 | 4,466 -----------+-----------+---------- Total | 18,997 | 18,997 Regards, Judson Caskey UCLA Anderson School of Management 110 Westwood Plaza, D416 Los Angeles, CA 90095 Office: (310)206-1503 Mobile: (310)775-0080 [hidden email] http://www.anderson.ucla.edu/x15538.xml * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
Dear Judson,
I used "gen double" and now it works! Thanks for the priceless help. Best regards, Mihai -----Ursprüngliche Nachricht----- Von: [hidden email] [mailto:[hidden email]] Im Auftrag von Caskey, Judson Gesendet: Mittwoch, 22. September 2010 04:25 An: [hidden email] Betreff: re: AW: AW: st: PSMATCH with 2 conditions Try using "gen double" for the modified p-scores: . webuse nlswork (National Longitudinal Survey. Young Women 14-26 years of age in 1968) . logit union collgrad age tenure not_smsa c_city south nev_mar Iteration 0: log likelihood = -10360.082 Iteration 1: log likelihood = -9886.8672 Iteration 2: log likelihood = -9876.1757 Iteration 3: log likelihood = -9876.1691 Iteration 4: log likelihood = -9876.1691 Logistic regression Number of obs = 18997 LR chi2(7) = 967.83 Prob > chi2 = 0.0000 Log likelihood = -9876.1691 Pseudo R2 = 0.0467 ---------------------------------------------------------------------------- -- union | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+-------------------------------------------------------------- -- collgrad | .3103226 .0424878 7.30 0.000 .2270481 .393597 age | -.0129597 .0032825 -3.95 0.000 -.0193933 -.0065262 tenure | .0889952 .0043032 20.68 0.000 .0805611 .0974293 not_smsa | -.0982961 .0466506 -2.11 0.035 -.1897296 -.0068625 c_city | .3455925 .0410881 8.41 0.000 .2650613 .4261236 south | -.6378885 .0380495 -16.76 0.000 -.7124641 -.5633128 nev_mar | -.0425097 .0461649 -0.92 0.357 -.1329912 .0479719 _cons | -1.076406 .1039223 -10.36 0.000 -1.28009 -.8727223 ---------------------------------------------------------------------------- -- . predict pscore if e(sample), pr (9537 missing values generated) . gen double pscore2=year*10+pscore (9537 missing values generated) . gen double pscore3=year*1000+pscore (9537 missing values generated) . psmatch2 union, pscore(pscore2) outcome(ln_wage) caliper(0.5) There are observations with identical propensity score values. The sort order of the data could affect your results. Make sure that the sort order is random before calling psmatch2. (9537 missing values generated) ---------------------------------------------------------------------------- ------------ Variable Sample | Treated Controls Difference S.E. T-stat ----------------------------+----------------------------------------------- ------------ ln_wage Unmatched | 1.92862097 1.70388928 .22473169 .007824215 28.72 ATT | 1.92862097 1.82488041 .103740566 .011091593 9.35 ----------------------------+----------------------------------------------- ------------ Note: S.E. for ATT does not take into account that the propensity score is estimated. | psmatch2: psmatch2: | Common Treatment | support assignment | On suppor | Total -----------+-----------+---------- Untreated | 14,531 | 14,531 Treated | 4,466 | 4,466 -----------+-----------+---------- Total | 18,997 | 18,997 . psmatch2 union, pscore(pscore3) outcome(ln_wage) caliper(0.5) There are observations with identical propensity score values. The sort order of the data could affect your results. Make sure that the sort order is random before calling psmatch2. (9537 missing values generated) ---------------------------------------------------------------------------- ------------ Variable Sample | Treated Controls Difference S.E. T-stat ----------------------------+----------------------------------------------- ------------ ln_wage Unmatched | 1.92862097 1.70388928 .22473169 .007824215 28.72 ATT | 1.92862097 1.82488041 .103740566 .011091593 9.35 ----------------------------+----------------------------------------------- ------------ Note: S.E. for ATT does not take into account that the propensity score is estimated. | psmatch2: psmatch2: | Common Treatment | support assignment | On suppor | Total -----------+-----------+---------- Untreated | 14,531 | 14,531 Treated | 4,466 | 4,466 -----------+-----------+---------- Total | 18,997 | 18,997 Regards, Judson Caskey UCLA Anderson School of Management 110 Westwood Plaza, D416 Los Angeles, CA 90095 Office: (310)206-1503 Mobile: (310)775-0080 [hidden email] http://www.anderson.ucla.edu/x15538.xml * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
Powered by Nabble | Edit this page |