Hi,

I'm running panel data regressions with ~1 million observations on voting

data. There are ~3000 fixed effects covering

-voting precincts (~2200, identified by tracking over time)

-candidates (~500, identified by candidate order rotations)

-offices (~200, identified by variation in contest ballot position)

There is no explicit time effect; time is implicitly indexed by "offices"

since in my dataset Governor in 1992 is different than Governor in 1994.

With 1 million observations, the dataset becomes enormous as I add all of my

fixed effects. I have some questions on how to implement these regressions.

1) What are the memory demands beyond the current dataset size when running

-xtreg- and declaring a panel variable? If my dataset is 1GB, and I specify

-i(precinct)- as an option when there are 2000+ precincts, how much more

memory does stata need to execute the command? Does it create the dummies

during the calculation, or does it perform the within-transformation? I'm

assuming the latter but 0.5GB of extra RAM beyond the 1GB dataset is not

enough.

2) Any feedback on the options for making the computation feasible would be

GREATLY appreciated. I'm attempting to implement them now and some are either

memory-restricted or give other errors:

i) get rid of the precinct and candidate fixed effects by creating spells

through

egen spells = group(precinct candidate)

and then specifying -i(spells)- in -xtreg-.

ii) Apply within transformation. Usually this is done with time but since I

don't have an explicity time variable, and I want to sweep out the largest

number of fixed effects, I could do the transform on precincts. Or spells? Or

redefining spells to be precinct-candidate, then transforming on spells?

iii) felsdvreg: Ran into memory restrictions.

iv) a2reg: I received a non-conformability error and have no idea how to get

around that.

v) Are there other methods that I'm missing???

On my 32-bit Stata/SE I can get about 1.4GB of RAM. Alternatively I can

sometimes get 2.5GB of physical RAM and more of virtual memory on quad-core

64-bit Linux servers. These are shared servers and generally experience a

heavy load of Matlab and simulation jobs.

Ultimately it's hard for me to even know how much RAM I need to specify or

need to hunt down.

Thanks so much in advance for any help and suggestions you can provide.

Cheers,

Scott Nicholson

Dept of Economics

Stanford University

____________________________________________________________________

*

* For searches and help try:

*

http://www.stata.com/help.cgi?search*

http://www.stata.com/support/statalist/faq*

http://www.ats.ucla.edu/stat/stata/