Dear Sven,

50000 regressions on 8-observations dataset of two variables should

take about 30 seconds (see below).

So don't generate the large dataset, but rather run the regressions

right away when you generate your simulated data.

You don't need to save the 50000x8 observations you generated, as

[presumably] you are also doing it with Stata, so

next time you simulate them with your do-file - they will be the same

(don't forget to set the rnd seed)

On the other hand, since you need only one coefficient from this

trivial regression, you may ask yourself if the -regress-

artillery is really necessary here, or a trivial formula, such as the one here:

http://en.wikipedia.org/wiki/Regression_analysiswould suffice (and be faster).

In any case, don't forget to specify -quietly-. I am almost sure you

don't have any intention to review the output of the

50,000 regressions, and that speeds up the program a lot.

Best,

Sergiy Radyakin.

PS: I am strongly convinced you don't need access to above 1GB memory

for the task of running univariate regressions on

8-observations datasets.

. do "R:\TEMP\STD04000000.tmp"

. set rmsg on

r; t=0.00 10:42:16

. sysuse auto, clear

(1978 Automobile Data)

r; t=0.00 10:42:16

. keep in 1/8

(66 observations deleted)

r; t=0.00 10:42:16

.

. forvalues i=1/50000 {

2. qui regress price weight

3. }

r; t=26.53 10:42:42

.

end of do-file

r; t=26.53 10:42:42

On Mon, Nov 15, 2010 at 10:16 AM, Trelle Sven <

[hidden email]> wrote:

> Dear all,

> I have a large (simulated) dataset with 400,000 observations (from

> overall 50,000 simulations each creating 8 observations). I need to

> perform a linear regression for each simulation separately. I noticed

> the following:

>

> 1) keeping all observations in the dataset and looping through the

> simulations is very inefficient i.e. it takes several hours to run e.g.

> * first example starts; run is an ID for simulation

> gen regcoeff = .

> forval s=1/50000 {

> regress x y if run==`s'

> replace regcoeff = _b[y] if _n==`s'

> }

> * first example ends

>

> 2) preserving and restoring is even more time-consuming

>

> 3) I thought of creating a loop as before but load the data at the

> beginning and then keeping only the data for the particular simulation.

> However, it implies that the data is loaded 50,000times (because it

> comes from a server with suboptimal connection speed this is also not

> optimal) and it would make storage of the results also a little bit

> difficult

> * second example starts

> gen regcoeff = .

> save sim.dta, replace

> local coeff = 0 // dummy for first run of loop

> local p = 1 // dummy for first run of loop

> forval s=1/50000 {

> use sim.dta, clear

> replace regcoeff = `coeff' if _n==`p'

> save sim.dta, replace

> keep if run==`s'

> regress x y

> local coeff = _b[y]

> local p=`s'

> }

> use sim.dta, clear

> replace regcoeff = `coeff' if _n==`p'

> save sim.dta, replace

> * second example ends

>

> I am sure there is a better way of doing this.

> If there is anybody who has better ideas I would appreciate any

> suggestions/help.

>

> All the best

> Sven

>

>

> *

> * For searches and help try:

> *

http://www.stata.com/help.cgi?search> *

http://www.stata.com/support/statalist/faq> *

http://www.ats.ucla.edu/stat/stata/>

*

* For searches and help try:

*

http://www.stata.com/help.cgi?search*

http://www.stata.com/support/statalist/faq*

http://www.ats.ucla.edu/stat/stata/