glm executes very very slow

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

glm executes very very slow

G. Dai
hi all,
I guess it might be useful to open a new thread for my problem.

I'm using glm command to estimate a fraction probit model.
however, when the do file goes to the following glm estimation,
it becomes very very slow. actually, it took me hours to finish the estimation.

I tried it on STATA SE 11.1 in a MAC and on STATA MP 11.0 in the department
server.

any help is appreciated.

Guang

************************excerpt begins******************
svy: glm   r`i'pin r`i'`x'nhu /*parental NH stay experience*/
               h`i'itot h`i'atotf /*income and financial wealth*/
               r`i'conde r`i'adlsa r`i'iadlsa /*health condition and status*/
               r`i'nrshom /*past NH experience*/
               raedyrs  r`i'agey_b  ragender /*identity*/
               h`i'child r`i'hiltc /*health insurance and children*/
               i.r`i'cenreg,fam(bi 1) link(probit);
***********************excerpt ends*****************
Note, where r`i'pin is the fractional variable ranges from 0 to
1,which massive point at 0.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|

Re: glm executes very very slow

G. Dai
FYI, the dataset is about 50m with obs about 4000.

On Wed, Jul 7, 2010 at 9:03 AM, G. Dai <[hidden email]> wrote:

> hi all,
> I guess it might be useful to open a new thread for my problem.
>
> I'm using glm command to estimate a fraction probit model.
> however, when the do file goes to the following glm estimation,
> it becomes very very slow. actually, it took me hours to finish the estimation.
>
> I tried it on STATA SE 11.1 in a MAC and on STATA MP 11.0 in the department
> server.
>
> any help is appreciated.
>
> Guang
>
> ************************excerpt begins******************
> svy: glm   r`i'pin r`i'`x'nhu /*parental NH stay experience*/
>               h`i'itot h`i'atotf /*income and financial wealth*/
>               r`i'conde r`i'adlsa r`i'iadlsa /*health condition and status*/
>               r`i'nrshom /*past NH experience*/
>               raedyrs  r`i'agey_b  ragender /*identity*/
>               h`i'child r`i'hiltc /*health insurance and children*/
>               i.r`i'cenreg,fam(bi 1) link(probit);
> ***********************excerpt ends*****************
> Note, where r`i'pin is the fractional variable ranges from 0 to
> 1,which massive point at 0.
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|

Re: glm executes very very slow

G. Dai
For reference only, if i replace  i.r`i'cenreg with r`i'cenreg, the
glm command runs through. but I don't know why. Maybe a bug with glm?

On Wed, Jul 7, 2010 at 10:17 AM, G. Dai <[hidden email]> wrote:

> FYI, the dataset is about 50m with obs about 4000.
>
> On Wed, Jul 7, 2010 at 9:03 AM, G. Dai <[hidden email]> wrote:
>> hi all,
>> I guess it might be useful to open a new thread for my problem.
>>
>> I'm using glm command to estimate a fraction probit model.
>> however, when the do file goes to the following glm estimation,
>> it becomes very very slow. actually, it took me hours to finish the estimation.
>>
>> I tried it on STATA SE 11.1 in a MAC and on STATA MP 11.0 in the department
>> server.
>>
>> any help is appreciated.
>>
>> Guang
>>
>> ************************excerpt begins******************
>> svy: glm   r`i'pin r`i'`x'nhu /*parental NH stay experience*/
>>               h`i'itot h`i'atotf /*income and financial wealth*/
>>               r`i'conde r`i'adlsa r`i'iadlsa /*health condition and status*/
>>               r`i'nrshom /*past NH experience*/
>>               raedyrs  r`i'agey_b  ragender /*identity*/
>>               h`i'child r`i'hiltc /*health insurance and children*/
>>               i.r`i'cenreg,fam(bi 1) link(probit);
>> ***********************excerpt ends*****************
>> Note, where r`i'pin is the fractional variable ranges from 0 to
>> 1,which massive point at 0.
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|

RE: glm executes very very slow

Nick Cox
I very much doubt the idea of a bug in -glm-. The point is that your syntax change is not at all trivial: it could mean many fewer parameters to estimate. Otherwise put, the issue is almost certainly the difficulty of fitting your model, rather than the size of the dataset or any problem in Stata.

Nick
[hidden email]

G. Dai

For reference only, if i replace  i.r`i'cenreg with r`i'cenreg, the
glm command runs through. but I don't know why. Maybe a bug with glm?

On Wed, Jul 7, 2010 at 10:17 AM, G. Dai <[hidden email]> wrote:

> FYI, the dataset is about 50m with obs about 4000.
>
> On Wed, Jul 7, 2010 at 9:03 AM, G. Dai <[hidden email]> wrote:
>> hi all,
>> I guess it might be useful to open a new thread for my problem.
>>
>> I'm using glm command to estimate a fraction probit model.
>> however, when the do file goes to the following glm estimation,
>> it becomes very very slow. actually, it took me hours to finish the estimation.
>>
>> I tried it on STATA SE 11.1 in a MAC and on STATA MP 11.0 in the department
>> server.
>>
>> any help is appreciated.
>>
>> Guang
>>
>> ************************excerpt begins******************
>> svy: glm   r`i'pin r`i'`x'nhu /*parental NH stay experience*/
>>               h`i'itot h`i'atotf /*income and financial wealth*/
>>               r`i'conde r`i'adlsa r`i'iadlsa /*health condition and status*/
>>               r`i'nrshom /*past NH experience*/
>>               raedyrs  r`i'agey_b  ragender /*identity*/
>>               h`i'child r`i'hiltc /*health insurance and children*/
>>               i.r`i'cenreg,fam(bi 1) link(probit);
>> ***********************excerpt ends*****************
>> Note, where r`i'pin is the fractional variable ranges from 0 to
>> 1,which massive point at 0.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|

Re: glm executes very very slow

G. Dai
maybe. The r`i'cenreg only take values with 1, 2, 3,4 , and 5.

On Wed, Jul 7, 2010 at 11:03 AM, Nick Cox <[hidden email]> wrote:

> I very much doubt the idea of a bug in -glm-. The point is that your syntax change is not at all trivial: it could mean many fewer parameters to estimate. Otherwise put, the issue is almost certainly the difficulty of fitting your model, rather than the size of the dataset or any problem in Stata.
>
> Nick
> [hidden email]
>
> G. Dai
>
> For reference only, if i replace  i.r`i'cenreg with r`i'cenreg, the
> glm command runs through. but I don't know why. Maybe a bug with glm?
>
> On Wed, Jul 7, 2010 at 10:17 AM, G. Dai <[hidden email]> wrote:
>> FYI, the dataset is about 50m with obs about 4000.
>>
>> On Wed, Jul 7, 2010 at 9:03 AM, G. Dai <[hidden email]> wrote:
>>> hi all,
>>> I guess it might be useful to open a new thread for my problem.
>>>
>>> I'm using glm command to estimate a fraction probit model.
>>> however, when the do file goes to the following glm estimation,
>>> it becomes very very slow. actually, it took me hours to finish the estimation.
>>>
>>> I tried it on STATA SE 11.1 in a MAC and on STATA MP 11.0 in the department
>>> server.
>>>
>>> any help is appreciated.
>>>
>>> Guang
>>>
>>> ************************excerpt begins******************
>>> svy: glm   r`i'pin r`i'`x'nhu /*parental NH stay experience*/
>>>               h`i'itot h`i'atotf /*income and financial wealth*/
>>>               r`i'conde r`i'adlsa r`i'iadlsa /*health condition and status*/
>>>               r`i'nrshom /*past NH experience*/
>>>               raedyrs  r`i'agey_b  ragender /*identity*/
>>>               h`i'child r`i'hiltc /*health insurance and children*/
>>>               i.r`i'cenreg,fam(bi 1) link(probit);
>>> ***********************excerpt ends*****************
>>> Note, where r`i'pin is the fractional variable ranges from 0 to
>>> 1,which massive point at 0.
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|

RE: glm executes very very slow

Nick Cox
As I understand it there are two hypotheses on the table for "very very slow":

1. A bug in -glm-.
2. The complexity of your model.

I put all my notional money on #2.

But regardless of that, how many parameters are you estimating here?

There's a kind of common presumption here that more experienced Stata users acquire some sort of Stata genius that allows them to diagnose remote problems with astounding acuteness. Not really; they're just more experienced and remember some of the mistakes they have made themselves.

All this relatively experienced Stata user sees here is

1. a complicated command
2. references to a dataset I can not experiment with
3. -svy-, which is usually a sign of difficulties.

So remote diagnosis is difficult.

What you could do is experiment with a much simpler version of your model just to see if it too runs very slowly.

Nick
[hidden email]

G. Dai

maybe. The r`i'cenreg only take values with 1, 2, 3,4 , and 5.

On Wed, Jul 7, 2010 at 11:03 AM, Nick Cox <[hidden email]> wrote:

> I very much doubt the idea of a bug in -glm-. The point is that your syntax change is not at all trivial: it could mean many fewer parameters to estimate. Otherwise put, the issue is almost certainly the difficulty of fitting your model, rather than the size of the dataset or any problem in Stata.

G. Dai

> For reference only, if i replace  i.r`i'cenreg with r`i'cenreg, the
> glm command runs through. but I don't know why. Maybe a bug with glm?
>
> On Wed, Jul 7, 2010 at 10:17 AM, G. Dai <[hidden email]> wrote:
>> FYI, the dataset is about 50m with obs about 4000.
>>
>> On Wed, Jul 7, 2010 at 9:03 AM, G. Dai <[hidden email]> wrote:
>>> hi all,
>>> I guess it might be useful to open a new thread for my problem.
>>>
>>> I'm using glm command to estimate a fraction probit model.
>>> however, when the do file goes to the following glm estimation,
>>> it becomes very very slow. actually, it took me hours to finish the estimation.
>>>
>>> I tried it on STATA SE 11.1 in a MAC and on STATA MP 11.0 in the department
>>> server.
>>>
>>> any help is appreciated.
>>>
>>> Guang
>>>
>>> ************************excerpt begins******************
>>> svy: glm   r`i'pin r`i'`x'nhu /*parental NH stay experience*/
>>>               h`i'itot h`i'atotf /*income and financial wealth*/
>>>               r`i'conde r`i'adlsa r`i'iadlsa /*health condition and status*/
>>>               r`i'nrshom /*past NH experience*/
>>>               raedyrs  r`i'agey_b  ragender /*identity*/
>>>               h`i'child r`i'hiltc /*health insurance and children*/
>>>               i.r`i'cenreg,fam(bi 1) link(probit);
>>> ***********************excerpt ends*****************
>>> Note, where r`i'pin is the fractional variable ranges from 0 to
>>> 1,which massive point at 0.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|

Re: glm executes very very slow

G. Dai
I totally agree with the presumption statement. We are all learning by doing.
As a way to find out the reason for the slow, I do the followings.
 r`i'cenreg only takes integer value 1 to 5. obs is about 4000.

First  I use the
  xi i.r`i'cenreg
and then run the command glm by replacing i.r`i'cenreg with _Ir`i'cenreg.
It is still very slow.

And I find only 2  obs with value 1 for the variable _Ir`i'cenreg_5

Second, I drop _Ir`i'cenreg_5 and repeat the above estimation. glm
works through.
not slow any more.

In sum, I guess it is because I only have two obs with r`i'cenreg=5,
which makes the glm very
hard to fit the data and thus very very slow.



On Wed, Jul 7, 2010 at 11:21 AM, Nick Cox <[hidden email]> wrote:

> As I understand it there are two hypotheses on the table for "very very slow":
>
> 1. A bug in -glm-.
> 2. The complexity of your model.
>
> I put all my notional money on #2.
>
> But regardless of that, how many parameters are you estimating here?
>
> There's a kind of common presumption here that more experienced Stata users acquire some sort of Stata genius that allows them to diagnose remote problems with astounding acuteness. Not really; they're just more experienced and remember some of the mistakes they have made themselves.
>
> All this relatively experienced Stata user sees here is
>
> 1. a complicated command
> 2. references to a dataset I can not experiment with
> 3. -svy-, which is usually a sign of difficulties.
>
> So remote diagnosis is difficult.
>
> What you could do is experiment with a much simpler version of your model just to see if it too runs very slowly.
>
> Nick
> [hidden email]
>
> G. Dai
>
> maybe. The r`i'cenreg only take values with 1, 2, 3,4 , and 5.
>
> On Wed, Jul 7, 2010 at 11:03 AM, Nick Cox <[hidden email]> wrote:
>
>> I very much doubt the idea of a bug in -glm-. The point is that your syntax change is not at all trivial: it could mean many fewer parameters to estimate. Otherwise put, the issue is almost certainly the difficulty of fitting your model, rather than the size of the dataset or any problem in Stata.
>
> G. Dai
>
>> For reference only, if i replace  i.r`i'cenreg with r`i'cenreg, the
>> glm command runs through. but I don't know why. Maybe a bug with glm?
>>
>> On Wed, Jul 7, 2010 at 10:17 AM, G. Dai <[hidden email]> wrote:
>>> FYI, the dataset is about 50m with obs about 4000.
>>>
>>> On Wed, Jul 7, 2010 at 9:03 AM, G. Dai <[hidden email]> wrote:
>>>> hi all,
>>>> I guess it might be useful to open a new thread for my problem.
>>>>
>>>> I'm using glm command to estimate a fraction probit model.
>>>> however, when the do file goes to the following glm estimation,
>>>> it becomes very very slow. actually, it took me hours to finish the estimation.
>>>>
>>>> I tried it on STATA SE 11.1 in a MAC and on STATA MP 11.0 in the department
>>>> server.
>>>>
>>>> any help is appreciated.
>>>>
>>>> Guang
>>>>
>>>> ************************excerpt begins******************
>>>> svy: glm   r`i'pin r`i'`x'nhu /*parental NH stay experience*/
>>>>               h`i'itot h`i'atotf /*income and financial wealth*/
>>>>               r`i'conde r`i'adlsa r`i'iadlsa /*health condition and status*/
>>>>               r`i'nrshom /*past NH experience*/
>>>>               raedyrs  r`i'agey_b  ragender /*identity*/
>>>>               h`i'child r`i'hiltc /*health insurance and children*/
>>>>               i.r`i'cenreg,fam(bi 1) link(probit);
>>>> ***********************excerpt ends*****************
>>>> Note, where r`i'pin is the fractional variable ranges from 0 to
>>>> 1,which massive point at 0.
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|

Re: glm executes very very slow

Steven Samuels
This post was updated on .
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: glm executes very very slow

G. Dai
Good advice. What confused me is that the probit runs the data very
well, while glm not.
one lesson, not trust any command not written by oneself.

On Wed, Jul 7, 2010 at 12:36 PM, Steve Samuels <[hidden email]> wrote:

> What you didn't  tell us, as requested by the FAQ, is what the Stata
> results were ; the coefficient for that small category would have
> stood out.
>
> The main lesson I hope you take away is not about Stata:  examine your
> data before  throwing it into a model.  That small category would have
> caused problems in any regression.
>
> By the way, the correct spelling of the program we all use is
> "Stata"--this is also in the FAQ.
>
> Steve
>
> On Wed, Jul 7, 2010 at 2:47 PM, G. Dai <[hidden email]> wrote:
>> I totally agree with the presumption statement. We are all learning by doing.
>> As a way to find out the reason for the slow, I do the followings.
>>  r`i'cenreg only takes integer value 1 to 5. obs is about 4000.
>>
>> First  I use the
>>  xi i.r`i'cenreg
>> and then run the command glm by replacing i.r`i'cenreg with _Ir`i'cenreg.
>> It is still very slow.
>>
>> And I find only 2  obs with value 1 for the variable _Ir`i'cenreg_5
>>
>> Second, I drop _Ir`i'cenreg_5 and repeat the above estimation. glm
>> works through.
>> not slow any more.
>>
>> In sum, I guess it is because I only have two obs with r`i'cenreg=5,
>> which makes the glm very
>> hard to fit the data and thus very very slow.
>>
>>
>>
>> On Wed, Jul 7, 2010 at 11:21 AM, Nick Cox <[hidden email]> wrote:
>>> As I understand it there are two hypotheses on the table for "very very slow":
>>>
>>> 1. A bug in -glm-.
>>> 2. The complexity of your model.
>>>
>>> I put all my notional money on #2.
>>>
>>> But regardless of that, how many parameters are you estimating here?
>>>
>>> There's a kind of common presumption here that more experienced Stata users acquire some sort of Stata genius that allows them to diagnose remote problems with astounding acuteness. Not really; they're just more experienced and remember some of the mistakes they have made themselves.
>>>
>>> All this relatively experienced Stata user sees here is
>>>
>>> 1. a complicated command
>>> 2. references to a dataset I can not experiment with
>>> 3. -svy-, which is usually a sign of difficulties.
>>>
>>> So remote diagnosis is difficult.
>>>
>>> What you could do is experiment with a much simpler version of your model just to see if it too runs very slowly.
>>>
>>> Nick
>>> [hidden email]
>>>
>>> G. Dai
>>>
>>> maybe. The r`i'cenreg only take values with 1, 2, 3,4 , and 5.
>>>
>>> On Wed, Jul 7, 2010 at 11:03 AM, Nick Cox <[hidden email]> wrote:
>>>
>>>> I very much doubt the idea of a bug in -glm-. The point is that your syntax change is not at all trivial: it could mean many fewer parameters to estimate. Otherwise put, the issue is almost certainly the difficulty of fitting your model, rather than the size of the dataset or any problem in Stata.
>>>
>>> G. Dai
>>>
>>>> For reference only, if i replace  i.r`i'cenreg with r`i'cenreg, the
>>>> glm command runs through. but I don't know why. Maybe a bug with glm?
>>>>
>>>> On Wed, Jul 7, 2010 at 10:17 AM, G. Dai <[hidden email]> wrote:
>>>>> FYI, the dataset is about 50m with obs about 4000.
>>>>>
>>>>> On Wed, Jul 7, 2010 at 9:03 AM, G. Dai <[hidden email]> wrote:
>>>>>> hi all,
>>>>>> I guess it might be useful to open a new thread for my problem.
>>>>>>
>>>>>> I'm using glm command to estimate a fraction probit model.
>>>>>> however, when the do file goes to the following glm estimation,
>>>>>> it becomes very very slow. actually, it took me hours to finish the estimation.
>>>>>>
>>>>>> I tried it on STATA SE 11.1 in a MAC and on STATA MP 11.0 in the department
>>>>>> server.
>>>>>>
>>>>>> any help is appreciated.
>>>>>>
>>>>>> Guang
>>>>>>
>>>>>> ************************excerpt begins******************
>>>>>> svy: glm   r`i'pin r`i'`x'nhu /*parental NH stay experience*/
>>>>>>               h`i'itot h`i'atotf /*income and financial wealth*/
>>>>>>               r`i'conde r`i'adlsa r`i'iadlsa /*health condition and status*/
>>>>>>               r`i'nrshom /*past NH experience*/
>>>>>>               raedyrs  r`i'agey_b  ragender /*identity*/
>>>>>>               h`i'child r`i'hiltc /*health insurance and children*/
>>>>>>               i.r`i'cenreg,fam(bi 1) link(probit);
>>>>>> ***********************excerpt ends*****************
>>>>>> Note, where r`i'pin is the fractional variable ranges from 0 to
>>>>>> 1,which massive point at 0.
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>
>
>
> --
> Steven Samuels
> [hidden email]
> 18 Cantine's Island
> Saugerties NY 12477
> USA
> Voice: 845-246-0774
> Fax:    206-202-4783
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|

Re: glm executes very very slow

nshephard
Administrator
On Wed, Jul 7, 2010 at 8:56 PM, G. Dai <[hidden email]> wrote:
> Good advice. What confused me is that the probit runs the data very
> well, while glm not.
> one lesson, not trust any command not written by oneself.

Good luck writing your own stats package from scratch and validating
it then! ;-)

-glm- (and -probit- for that matter) is an official Stata do-file and
will have been tested thoroughly by Statacorp.

Neil


--
"... no scientific worker has a fixed level of significance at which
from year to year, and in all circumstances, he rejects hypotheses; he
rather gives his mind to each particular case in the light of his
evidence and his ideas." - Sir Ronald A. Fisher (1956)

Email - [hidden email]
Website - http://slack.ser.man.ac.uk/
Photos - http://www.flickr.com/photos/slackline/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/