I have a question about working with very large data sets (combined
sizes ~ 40 gig) to run analysis when only 6 gig of memory is available. A second complicating factor is that I need to join some of these data sets together based on a date range or similar join rule. In Oracle, I could query out only the columns I need and then join them to other files using a rule- such as the dates being within "x" number of days of each other. I cannot get ""merge"" in stata to accept these kinds of date ranges. Here are an example of two datasets to join ***subdataset*** date1 var1 extravar extravar1 10/22/2008 3 44 44 02/01/2001 5 44 44 05/24/2005 9 44 44 12/12/2012 99 44 44 12/29/2012 100 44 44 ***big dataset*** date1 var2 extravar extravar1 10/20/2008 500 44 44 02/07/2001 500 44 44 05/20/2005 900 44 44 12/12/2015 990 44 44 01/01/1999 1000 44 44 01/01/1970 2000 44 44 01/01/1970 2222 44 44 12/01/2012 7777 44 55 I need to join by ""date1"" and load up a data set for analysis with ONLY ""date1"", ""var1"", and ""extravar1"". Thanks for helping. DFS * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
<>
On Jan 1, 2010, at 10:37 AM, David Souther wrote: > I have a question about working with very large data sets (combined > sizes ~ 40 gig) to run analysis when only 6 gig of memory is > available. .What format are the data stored in? If they are already in Stata format (.dta), then you can use the -use- command and specify the variables and observations you want to use, e.g. use price mpg for in 1/25 if for<1 using auto.dta, clear This would "query" (as you put it) out only those 3 variables in rows 1/25 where foreign<1. However, if you were starting from a very large comma/tab-delimited file (and you weren't able to convert the file to Stata format via Stat Transfer & the file was too large to simply -insheet- ), you could use -chunky- (from SSC) to bring in the vars/rows you needed. The help file for -chunky- has a loop for this process already spelled out. > I cannot get ""merge"" in stata to accept these kinds > of date ranges. What commands did you attempt? You can use -nearmrg- (from SSC) to do some date range matches. Here's an example of how to get it to work with your example data; however, please note a few things: (1) I am assuming you want to "update" (see the -merge- help file about this option) extravar1 since its value changes in the "big" dataset; (2) my example matches the "sub" dataset record that is closest to & greater than the nearest match in the "big" dataset, then I remove records where the "sub" date is > 90 days from the "big" date **************start clear **"sub" dataset** input str10 date1 var1 extravar extravar1 "10/22/2008" 3 44 44 "02/01/2001" 5 44 44 "05/24/2005" 9 44 44 "12/12/2012" 99 44 44 "12/29/2012" 100 44 44 end gen date2 = date(date1, "MDY") sort date2 drop date1 extravar compress save "using.dta", replace ******** **"big" dataset** clear input str10 date1 var2 extravar extravar1 "10/20/2008" 500 44 44 "02/07/2001" 500 44 44 "05/20/2005" 900 44 44 "12/12/2015" 990 44 44 "01/01/1999" 1000 44 44 "01/01/1970" 2000 44 44 "01/01/1970" 2222 44 44 "12/01/2012" 7777 44 55 end gen date2 = date(date1, "MDY") sort date2 drop date1 compress save "master.dta", replace ******** **merge** cap ssc install nearmrg **using only what's needed** clear use date2 extravar1 using "master.dta" nearmrg using "using.dta", upper nearvar(date2) genmatch(match) _merge(merge) update nokeep compress rename date2 date1 format date1 %td format match %td li date1 match var* extravar1 **keep only within certain date range of "x"=90 days** g diff = match-date1 li date1 diff keep if diff<=90 //-matching on 90 days from "big" date drop diff merge li * save "combined.dta", replace **your analysis here** reg var1 ext **************end Note: From your post it sounds like you do not want to load all of the 40G of data into memory at one time for analysis (that is, you only want some of the variables loaded in), but depending on the number of observations, your dataset may still be quite large, so you may want to consider the OS, physical memory, and virtual memory on your machine. See these links: http://www.stata.com/statalist/archive/2007-05/msg00326.html http://www.stata.com/statalist/archive/2009-12/msg00980.html Good luck, Eric __ Eric A. Booth Public Policy Research Institute Texas A&M University [hidden email] Office: +979.845.6754 On Jan 1, 2010, at 10:37 AM, David Souther wrote: > I have a question about working with very large data sets (combined > sizes ~ 40 gig) to run analysis when only 6 gig of memory is > available. > A second complicating factor is that I need to join some of these data > sets together based on a date range or similar join rule. In Oracle, > I could query out only the columns I need and then join them to other > files using a rule- such as the dates being within "x" number of days > of each other. I cannot get ""merge"" in stata to accept these kinds > of date ranges. Here are an example of two datasets to join > > ***subdataset*** > date1 var1 extravar extravar1 > 10/22/2008 3 44 44 > 02/01/2001 5 44 44 > 05/24/2005 9 44 44 > 12/12/2012 99 44 44 > 12/29/2012 100 44 44 > > ***big dataset*** > date1 var2 extravar extravar1 > 10/20/2008 500 44 44 > 02/07/2001 500 44 44 > 05/20/2005 900 44 44 > 12/12/2015 990 44 44 > 01/01/1999 1000 44 44 > 01/01/1970 2000 44 44 > 01/01/1970 2222 44 44 > 12/01/2012 7777 44 55 > > > I need to join by ""date1"" and load up a data set for analysis with > ONLY ""date1"", ""var1"", and ""extravar1"". Thanks for helping. > > DFS > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
Dear all,
Is there anyone that have experienced the same question with mine in STATA? Please let me know how you did it. My question is to test the equality of the two IV estimators. But I have no idea of how to define an unrestricted regression expression in STATA. *restricted model: all coefficients are equal between the two groups ivreg y (x1 x2 = z1 z2) *unrestircted model: the coefficient for x1 is different but the coefficient for x2 is equal javanfish. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
Hi:
I suspect you could test this with a structural equation modeling program (using Maximum Likelihood estimation) like glamm (I use Mplus for those purposes because it is much faster with ML estimation than glamm is; however, I don't know glamm well enough to make a definitive statement on this). Basically, in the case of a structural-equation modeling program you would define a stacked (multisample or multiple group model) and the test for the equality of coefficient/2 across groups (don't forget to allow the disturbances of y x1 and x2 to correlate, otherwise you will get OLS estimates). HTH, John. ____________________________________________________ Prof. John Antonakis, Associate Dean Faculty of Business and Economics Department of Organizational Behavior University of Lausanne Internef #618 CH-1015 Lausanne-Dorigny Switzerland Tel ++41 (0)21 692-3438 Fax ++41 (0)21 692-3305 Faculty page: http://www.hec.unil.ch/people/jantonakis Personal page: http://www.hec.unil.ch/jantonakis ____________________________________________________ On 01.01.2010 23:33, Sungbok Lee wrote: > Dear all, > > Is there anyone that have experienced the same question with mine in STATA? > Please let me know how you did it. > > My question is to test the equality of the two IV estimators. But I have no > idea of how to define an unrestricted regression expression in STATA. > > *restricted model: all coefficients are equal between the two groups > ivreg y (x1 x2 = z1 z2) > > *unrestircted model: the coefficient for x1 is different but the coefficient > for x2 is equal > > > javanfish. > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
Powered by Nabble | Edit this page |