# Propensity Score Matching

4 messages
Open this post in threaded view
|
Report Content as Inappropriate

## Propensity Score Matching

 This post has NOT been accepted by the mailing list yet. Hi all, I have a question on Propensity Score Matching. I'm trying to evaluate the impact of migration on children's schooling. My data is cross-section and I do not have child-level data at time before migration occured. But I have data on household-level at time before migration occured. Therefore, I decided to match based on household-level data, since it is measured before participation into migration. Since my outcome is at individual-level, there might be some individual characteristics that affect my outcome. Estimating the impact of migration by propensity score matching constructed based on household-level variables won't be enough. My question is, can I estimate the impact of migration using propensity score matching (covariates used are household-level) and also incorporate some individual-level variables? I'm thinking of estimating such model: Sij = Mj + Gj + Aij + Bij + e For Sij = schooling of child i at household j Mj = 1 for migrant household, 0 for household without migrant Gj = propensity score for household j (the same for all kids at one household) Aij = for example age of the child i Bij = for example sex of the child i and then since children in household are related, I'm gonna cluster the standard errors at household level. Is that possible to do this with Propensity Score Matching? Could someone tell me how to do it in Stata? I've read a lot of references using PSM, but none of it has additional variables to predict the ATT like in my problems.
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Propensity Score Matching

 This post has NOT been accepted by the mailing list yet. Thanks Austin for taking the time answering my question.. I will consider your advice! On the other note, I have another question on the command psmatch2. What does _weight really contain? To my knowledge, it holds the frequency a particular observation was used as a match (and the value will be missing if the observation was not a control or used as a match). Does it mean that if we run "regress y treatment variable if _weight!=." it will produce the same result as "psmatch2 treatment, outcome(y) norepl" for every matching algorithm? I have tested it and the two commands produce the same result for 1:1 matching but not for any other algorithm... What should I do with _weight in a regression command to replicate the result of psmatch2? I tried for kernel and radius matching, the command that produces result similar to psmatch2 is: regress y treatment variable [aw=_weight] if _weight!=. but it's not exactly the same anyway.. any suggestion? Any help is very much appreciated. Thanks! Niken
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Propensity Score Matching

 This post has NOT been accepted by the mailing list yet. In reply to this post by niken kusumawardhani Dear Ariel, thank you for your information, that's a great help! My goal is to prove that using weighted regression with the weights provided by psmatch2 will yield the same result as directly typing psmatch2 x, pscore(_ps) outcome(y). So you really answered my question, thanks again! I have another question. Hope that anyone in this forum could explain to me.. I understand that the variable _weight in psmatch2 gives you information on the value of weight generated by psmatch2. To be more precise, let me describe my dataset. it consists of 702 treated observations and 289 control. For example: I try with nearest neighbor matching without replacement, and obtain 280 treated and 280 control under common support. The variable _weight has 560 values, with all weight equals 1. So basically all observations under common support has its weight. But what makes me confused is, when I try matching with replacement, not every observation has the weight. Under this algorithm, I get 630 treated and 289 control under common support. However, the variable _weight only contains 650 values: 623 for treated, and 27 for control. Moreover, for control observations, the value of weight is so big, like 30 and 60. I don't understand how it works.. If by using 'without replacement' I have 289 control, doesn't it mean under 'with replacement' I should have at least similar number of control observations? And why does for NN matching with replacement, some observations under common support have missing data for the weight? Thanks before! Best, Niken