st: computational problems with many (~3000) fixed effects

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

st: computational problems with many (~3000) fixed effects

Scott Nicholson

I'm running panel data regressions with ~1 million observations on voting
data. There are ~3000 fixed effects covering

-voting precincts (~2200, identified by tracking over time)
-candidates (~500, identified by candidate order rotations)
-offices (~200, identified by variation in contest ballot position)

There is no explicit time effect; time is implicitly indexed by "offices"
since in my dataset Governor in 1992 is different than Governor in 1994.

With 1 million observations, the dataset becomes enormous as I add all of my
fixed effects. I have some questions on how to implement these regressions.

1) What are the memory demands beyond the current dataset size when running
-xtreg- and declaring a panel variable? If my dataset is 1GB, and I specify
-i(precinct)- as an option when there are 2000+ precincts, how much more
memory does stata need to execute the command? Does it create the dummies
during the calculation, or does it perform the within-transformation? I'm
assuming the latter but 0.5GB of extra RAM beyond the 1GB dataset is not

2) Any feedback on the options for making the computation feasible would be
GREATLY appreciated. I'm attempting to implement them now and some are either
memory-restricted or give other errors:

i) get rid of the precinct and candidate fixed effects by creating spells
   egen spells = group(precinct candidate)

and then specifying -i(spells)- in -xtreg-.

ii) Apply within transformation. Usually this is done with time but since I
don't have an explicity time variable, and I want to sweep out the largest
number of fixed effects, I could do the transform on precincts. Or spells? Or
redefining spells to be precinct-candidate, then transforming on spells?

iii) felsdvreg: Ran into memory restrictions.

iv) a2reg: I received a non-conformability error and have no idea how to get
around that.

v) Are there other methods that I'm missing???

On my 32-bit Stata/SE I can get about 1.4GB of RAM. Alternatively I can
sometimes get 2.5GB of physical RAM and more of virtual memory on quad-core
64-bit Linux servers. These are shared servers and generally experience a
heavy load of Matlab and simulation jobs.

Ultimately it's hard for me to even know how much RAM I need to specify or
need to hunt down.

Thanks so much in advance for any help and suggestions you can provide.

Scott Nicholson
Dept of Economics
Stanford University


*   For searches and help try: