Quantcast

PSMATCH with 2 conditions

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

PSMATCH with 2 conditions

Mihai-Andrei Popescu-Greaca
PSMATCH only IF

Dear Statalist users,
I am writing a study on the performance of Private Equity (PE) vs.
Non-Private Equity (NPE) backed IPOs. In order to eliminate the endogenity
of being PE-backed, I want to perform propensity score matching by applying
both local linear regression and k-nearest neighbors methods.

My depvar is PE_backed
My [indepvars] are balance sheet and income statement figures like sales,
total assets, operating income, etc.
My outcome is the buy-and-hold-return 4 years after the IPO

In my dataset I created a variable which takes values from 1 to 15, each
number corresponding to the industry the PE- or NPE-backed firm belongs to.
I also have a variable which indicates the year when the IPO occurred.

Now my question: I want to match firms with each other, ONLY if they belong
to the same industry and if the IPO occurred in the same YEAR. In other
words, match firm A (PE-backed) with the firm B (NPE-backed) for which the
distance between their propensity scores is the smallest AND they belong to
the same industry and their IPOs occurred in the same year.
How can I compute this with psmatch2?

Thanks
Mihai

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

re: PSMATCH with 2 conditions

Caskey, Judson
See the following post for an example of forcing a match:

http://www.stata.com/statalist/archive/2010-09/msg00073.html

In your case, say you have a 4-digit industry code (e.g. SIC). You can first compute propensity scores, then make a new variable pscore2 that looks like:

pscore2 = year*100000+industry*10+pscore

If you have year 1999, industry 1234 and p-score 0.25, you get:

pscore2 = 199912340.25

If you put a caliper of, say, 0.5, it is impossible to match to a firm in a different industry/year.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

re: PSMATCH with 2 conditions

Ariel Linden, DrPH-2
In reply to this post by Mihai-Andrei Popescu-Greaca
I would suggest another path to take here. cem (Coarsened Exact Matching)is
user written program by Gary King (http://gking.harvard.edu/cem/) that
allows you to force a match on a specific level of a variable (for example,
force gender to be exact). In this case, I would match some variables within
calipers and require exact matches where necessary on other variables.

I hope this helps

Ariel

Date: Tue, 7 Sep 2010 11:07:37 -0700
From: "Caskey, Judson" <[hidden email]>
Subject: re: st: PSMATCH with 2 conditions

See the following post for an example of forcing a match:

http://www.stata.com/statalist/archive/2010-09/msg00073.html

In your case, say you have a 4-digit industry code (e.g. SIC). You can first
compute propensity scores, then make a new variable pscore2 that looks like:

pscore2 = year*100000+industry*10+pscore

If you have year 1999, industry 1234 and p-score 0.25, you get:

pscore2 = 199912340.25

If you put a caliper of, say, 0.5, it is impossible to match to a firm in a
different industry/year.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

AW: st: PSMATCH with 2 conditions

Mihai-Andrei Popescu-Greaca
In reply to this post by Caskey, Judson
Hi Judson,

thanks for the great tip & sorry for the late reply but I've been pretty
busy lately.

One thing though:
My industry is only 2 digit, so if I only want to match by industry, I just
multiply industry by 10 and then add the pscore, as follows:
gen pscore2=3-digit-industry*10+pscore
And then psmatch2 as indicated by you in the link.
THE PROBLEM occurred when I used 1000 instead of 10, and then 100000 instead
of 1000; ALL 3 yielded different results (T & Z-scores when bootstrapping
SEs)

Do you have an explanation for the different results??

Regards,
Mihai

-----Ursprüngliche Nachricht-----
Von: [hidden email]
[mailto:[hidden email]] Im Auftrag von Caskey, Judson
Gesendet: Dienstag, 7. September 2010 20:08
An: [hidden email]
Betreff: re: st: PSMATCH with 2 conditions

See the following post for an example of forcing a match:

http://www.stata.com/statalist/archive/2010-09/msg00073.html

In your case, say you have a 4-digit industry code (e.g. SIC). You can first
compute propensity scores, then make a new variable pscore2 that looks like:

pscore2 = year*100000+industry*10+pscore

If you have year 1999, industry 1234 and p-score 0.25, you get:

pscore2 = 199912340.25

If you put a caliper of, say, 0.5, it is impossible to match to a firm in a
different industry/year.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

re: AW: st: PSMATCH with 2 conditions

Caskey, Judson
Mihai,

I was unable to replicate the problem. Can you try replicating it on one of the public datasets and send an example?

Here is what I tried:

webuse nlswork
logit union collgrad age tenure not_smsa c_city south nev_mar
predict pscore if e(sample), pr
gen pscore2=year*10+pscore
gen pscore3=year*1000+pscore
psmatch2 union, pscore(pscore2) outcome(ln_wage) caliper(0.5)
psmatch2 union, pscore(pscore3) outcome(ln_wage) caliper(0.5)


I get the same results in both of the psmatch2 calls. The issue may be with the bootstrapping procedure rather than with the call to psmatch2, itself.

Regards,

Judson

----------------------------------

Hi Judson,

thanks for the great tip & sorry for the late reply but I've been pretty
busy lately.

One thing though:
My industry is only 2 digit, so if I only want to match by industry, I just
multiply industry by 10 and then add the pscore, as follows:
gen pscore2=3-digit-industry*10+pscore
And then psmatch2 as indicated by you in the link.
THE PROBLEM occurred when I used 1000 instead of 10, and then 100000 instead
of 1000; ALL 3 yielded different results (T & Z-scores when bootstrapping
SEs)

Do you have an explanation for the different results??

Regards,
Mihai

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

AW: AW: st: PSMATCH with 2 conditions

Mihai-Andrei Popescu-Greaca
Dear Judson,

Just tried out the multiplication with 10 and 1000 on the nlswork dataset,
and again, the T-values are different (without bootstrapping). When I use
pscore2 (multiplication with 10) I get a T of 9.16, Difference is .102809747
and SE is .011229545. Using pscore3 (*1000) yields a T of 4.02, a Difference
of .092227641 and a SE of .02294341.

The only explanation I came up with is that (my) Stata "looses" decimals
with each multiplication or performs a "mysterious" round-up. I.e. if my
initial pscore is 0.28732124 (like row 3 in the dataset), then the pscore2
is 720.28729 and the pscore3 is 72000.287. IF my theory holds, then by
multiplying with a huge number, all decimals should disappear. I generated a
pscore4 (*1000000) and now I only see 72000000, but T is still 4.72,
Difference is .263281083 and SE .055774089, hence this doesn't make sense
either.

I'm using Stata 11.1..I really don't know what else can be wrong..:(((((

PS: I exported the data to excel, and only the "short" numbers get exported
(i.e. 72000.287, and NOT 72*1000+0.287321424)

Best,
Mihai

-----Ursprüngliche Nachricht-----
Von: [hidden email]
[mailto:[hidden email]] Im Auftrag von Caskey, Judson
Gesendet: Montag, 20. September 2010 17:48
An: [hidden email]
Betreff: re: AW: st: PSMATCH with 2 conditions

Mihai,

I was unable to replicate the problem. Can you try replicating it on one of
the public datasets and send an example?

Here is what I tried:

webuse nlswork
logit union collgrad age tenure not_smsa c_city south nev_mar
predict pscore if e(sample), pr
gen pscore2=year*10+pscore
gen pscore3=year*1000+pscore
psmatch2 union, pscore(pscore2) outcome(ln_wage) caliper(0.5)
psmatch2 union, pscore(pscore3) outcome(ln_wage) caliper(0.5)


I get the same results in both of the psmatch2 calls. The issue may be with
the bootstrapping procedure rather than with the call to psmatch2, itself.

Regards,

Judson

----------------------------------

Hi Judson,

thanks for the great tip & sorry for the late reply but I've been pretty
busy lately.

One thing though:
My industry is only 2 digit, so if I only want to match by industry, I just
multiply industry by 10 and then add the pscore, as follows:
gen pscore2=3-digit-industry*10+pscore
And then psmatch2 as indicated by you in the link.
THE PROBLEM occurred when I used 1000 instead of 10, and then 100000 instead
of 1000; ALL 3 yielded different results (T & Z-scores when bootstrapping
SEs)

Do you have an explanation for the different results??

Regards,
Mihai

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

re: AW: AW: st: PSMATCH with 2 conditions

Caskey, Judson
In reply to this post by Mihai-Andrei Popescu-Greaca
Try using "gen double" for the modified p-scores:


. webuse nlswork
(National Longitudinal Survey.  Young Women 14-26 years of age in 1968)

. logit union collgrad age tenure not_smsa c_city south nev_mar

Iteration 0:   log likelihood = -10360.082  
Iteration 1:   log likelihood = -9886.8672  
Iteration 2:   log likelihood = -9876.1757  
Iteration 3:   log likelihood = -9876.1691  
Iteration 4:   log likelihood = -9876.1691  

Logistic regression                               Number of obs   =      18997
                                                  LR chi2(7)      =     967.83
                                                  Prob > chi2     =     0.0000
Log likelihood = -9876.1691                       Pseudo R2       =     0.0467

------------------------------------------------------------------------------
       union |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    collgrad |   .3103226   .0424878     7.30   0.000     .2270481     .393597
         age |  -.0129597   .0032825    -3.95   0.000    -.0193933   -.0065262
      tenure |   .0889952   .0043032    20.68   0.000     .0805611    .0974293
    not_smsa |  -.0982961   .0466506    -2.11   0.035    -.1897296   -.0068625
      c_city |   .3455925   .0410881     8.41   0.000     .2650613    .4261236
       south |  -.6378885   .0380495   -16.76   0.000    -.7124641   -.5633128
     nev_mar |  -.0425097   .0461649    -0.92   0.357    -.1329912    .0479719
       _cons |  -1.076406   .1039223   -10.36   0.000     -1.28009   -.8727223
------------------------------------------------------------------------------

. predict pscore if e(sample), pr
(9537 missing values generated)

. gen double pscore2=year*10+pscore
(9537 missing values generated)

. gen double pscore3=year*1000+pscore
(9537 missing values generated)

. psmatch2 union, pscore(pscore2) outcome(ln_wage) caliper(0.5)
There are observations with identical propensity score values.
The sort order of the data could affect your results.
Make sure that the sort order is random before calling psmatch2.
(9537 missing values generated)
----------------------------------------------------------------------------------------
        Variable     Sample |    Treated     Controls   Difference         S.E.   T-stat
----------------------------+-----------------------------------------------------------
         ln_wage  Unmatched | 1.92862097   1.70388928    .22473169   .007824215    28.72
                        ATT | 1.92862097   1.82488041   .103740566   .011091593     9.35
----------------------------+-----------------------------------------------------------
Note: S.E. for ATT does not take into account that the propensity score is estimated.

           | psmatch2:
 psmatch2: |   Common
 Treatment |  support
assignment | On suppor |     Total
-----------+-----------+----------
 Untreated |    14,531 |    14,531
   Treated |     4,466 |     4,466
-----------+-----------+----------
     Total |    18,997 |    18,997


. psmatch2 union, pscore(pscore3) outcome(ln_wage) caliper(0.5)
There are observations with identical propensity score values.
The sort order of the data could affect your results.
Make sure that the sort order is random before calling psmatch2.
(9537 missing values generated)
----------------------------------------------------------------------------------------
        Variable     Sample |    Treated     Controls   Difference         S.E.   T-stat
----------------------------+-----------------------------------------------------------
         ln_wage  Unmatched | 1.92862097   1.70388928    .22473169   .007824215    28.72
                        ATT | 1.92862097   1.82488041   .103740566   .011091593     9.35
----------------------------+-----------------------------------------------------------
Note: S.E. for ATT does not take into account that the propensity score is estimated.

           | psmatch2:
 psmatch2: |   Common
 Treatment |  support
assignment | On suppor |     Total
-----------+-----------+----------
 Untreated |    14,531 |    14,531
   Treated |     4,466 |     4,466
-----------+-----------+----------
     Total |    18,997 |    18,997


Regards,

Judson Caskey
UCLA Anderson School of Management
110 Westwood Plaza, D416
Los Angeles, CA  90095
Office:                  (310)206-1503
Mobile:                (310)775-0080
[hidden email]
http://www.anderson.ucla.edu/x15538.xml
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

AW: AW: AW: st: PSMATCH with 2 conditions

Mihai-Andrei Popescu-Greaca
Dear Judson,

I used "gen double" and now it works! Thanks for the priceless help.

Best regards,
Mihai

-----Ursprüngliche Nachricht-----
Von: [hidden email]
[mailto:[hidden email]] Im Auftrag von Caskey, Judson
Gesendet: Mittwoch, 22. September 2010 04:25
An: [hidden email]
Betreff: re: AW: AW: st: PSMATCH with 2 conditions

Try using "gen double" for the modified p-scores:


. webuse nlswork
(National Longitudinal Survey.  Young Women 14-26 years of age in 1968)

. logit union collgrad age tenure not_smsa c_city south nev_mar

Iteration 0:   log likelihood = -10360.082  
Iteration 1:   log likelihood = -9886.8672  
Iteration 2:   log likelihood = -9876.1757  
Iteration 3:   log likelihood = -9876.1691  
Iteration 4:   log likelihood = -9876.1691  

Logistic regression                               Number of obs   =
18997
                                                  LR chi2(7)      =
967.83
                                                  Prob > chi2     =
0.0000
Log likelihood = -9876.1691                       Pseudo R2       =
0.0467

----------------------------------------------------------------------------
--
       union |      Coef.   Std. Err.      z    P>|z|     [95% Conf.
Interval]
-------------+--------------------------------------------------------------
--
    collgrad |   .3103226   .0424878     7.30   0.000     .2270481
.393597
         age |  -.0129597   .0032825    -3.95   0.000    -.0193933
-.0065262
      tenure |   .0889952   .0043032    20.68   0.000     .0805611
.0974293
    not_smsa |  -.0982961   .0466506    -2.11   0.035    -.1897296
-.0068625
      c_city |   .3455925   .0410881     8.41   0.000     .2650613
.4261236
       south |  -.6378885   .0380495   -16.76   0.000    -.7124641
-.5633128
     nev_mar |  -.0425097   .0461649    -0.92   0.357    -.1329912
.0479719
       _cons |  -1.076406   .1039223   -10.36   0.000     -1.28009
-.8727223
----------------------------------------------------------------------------
--

. predict pscore if e(sample), pr
(9537 missing values generated)

. gen double pscore2=year*10+pscore
(9537 missing values generated)

. gen double pscore3=year*1000+pscore
(9537 missing values generated)

. psmatch2 union, pscore(pscore2) outcome(ln_wage) caliper(0.5)
There are observations with identical propensity score values.
The sort order of the data could affect your results.
Make sure that the sort order is random before calling psmatch2.
(9537 missing values generated)
----------------------------------------------------------------------------
------------
        Variable     Sample |    Treated     Controls   Difference
S.E.   T-stat
----------------------------+-----------------------------------------------
------------
         ln_wage  Unmatched | 1.92862097   1.70388928    .22473169
.007824215    28.72
                        ATT | 1.92862097   1.82488041   .103740566
.011091593     9.35
----------------------------+-----------------------------------------------
------------
Note: S.E. for ATT does not take into account that the propensity score is
estimated.

           | psmatch2:
 psmatch2: |   Common
 Treatment |  support
assignment | On suppor |     Total
-----------+-----------+----------
 Untreated |    14,531 |    14,531
   Treated |     4,466 |     4,466
-----------+-----------+----------
     Total |    18,997 |    18,997


. psmatch2 union, pscore(pscore3) outcome(ln_wage) caliper(0.5)
There are observations with identical propensity score values.
The sort order of the data could affect your results.
Make sure that the sort order is random before calling psmatch2.
(9537 missing values generated)
----------------------------------------------------------------------------
------------
        Variable     Sample |    Treated     Controls   Difference
S.E.   T-stat
----------------------------+-----------------------------------------------
------------
         ln_wage  Unmatched | 1.92862097   1.70388928    .22473169
.007824215    28.72
                        ATT | 1.92862097   1.82488041   .103740566
.011091593     9.35
----------------------------+-----------------------------------------------
------------
Note: S.E. for ATT does not take into account that the propensity score is
estimated.

           | psmatch2:
 psmatch2: |   Common
 Treatment |  support
assignment | On suppor |     Total
-----------+-----------+----------
 Untreated |    14,531 |    14,531
   Treated |     4,466 |     4,466
-----------+-----------+----------
     Total |    18,997 |    18,997


Regards,

Judson Caskey
UCLA Anderson School of Management
110 Westwood Plaza, D416
Los Angeles, CA  90095
Office:                  (310)206-1503
Mobile:                (310)775-0080
[hidden email]
http://www.anderson.ucla.edu/x15538.xml
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Loading...