

I've got a panel dataset (xt) uniquely identified by subject (id) and time (hour), sorted by id hour.
I'd like to generate a variable that counts the cumulative (within id, across hour) number of hours that a variable is less than 50.
My code so far:
gen varxl50 = varx <50 if varx <.
bysort patnum (hour): gen varxl50sum = sum(varxl50)
I'm running into problems because of missing values I think.
Does this code look right?
Is there a mode succinct way to code this?
Thanks in advance
George Hoffman
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


We need to know more about how hour is defined and measured. Is it a time since some zero, or a duration? Show a segment of your data for one subject.
Nick
[hidden email]
Hoffman, George
I've got a panel dataset (xt) uniquely identified by subject (id) and time (hour), sorted by id hour.
I'd like to generate a variable that counts the cumulative (within id, across hour) number of hours that a variable is less than 50.
My code so far:
gen varxl50 = varx <50 if varx <.
bysort patnum (hour): gen varxl50sum = sum(varxl50)
I'm running into problems because of missing values I think.
Does this code look right?
Is there a mode succinct way to code this?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


bysort id (hour) : gen mysum = sum(varx < 50)
Nick
[hidden email]
Hoffman, George
id: integers 1,2....200
hour: integers 1,2...48
varx : continuous, 0100 and missing
Nick Cox
We need to know more about how hour is defined and measured. Is it a time since some zero, or a duration? Show a segment of your data for one subject.
Hoffman, George
I've got a panel dataset (xt) uniquely identified by subject (id) and time (hour), sorted by id hour.
I'd like to generate a variable that counts the cumulative (within id, across hour) number of hours that a variable is less than 50.
My code so far:
gen varxl50 = varx <50 if varx <.
bysort patnum (hour): gen varxl50sum = sum(varxl50)
I'm running into problems because of missing values I think.
Does this code look right?
Is there a mode succinct way to code this?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


This works. Thanks!
Original Message
From: [hidden email] [mailto: [hidden email]] On Behalf Of Nick Cox
Sent: Monday, November 29, 2010 12:13 PM
To: ' [hidden email]'
Subject: st: RE: RE: RE: summarize conditions within subjects in panel data
bysort id (hour) : gen mysum = sum(varx < 50)
Nick
[hidden email]
Hoffman, George
id: integers 1,2....200
hour: integers 1,2...48
varx : continuous, 0100 and missing
Nick Cox
We need to know more about how hour is defined and measured. Is it a time since some zero, or a duration? Show a segment of your data for one subject.
Hoffman, George
I've got a panel dataset (xt) uniquely identified by subject (id) and time (hour), sorted by id hour.
I'd like to generate a variable that counts the cumulative (within id, across hour) number of hours that a variable is less than 50.
My code so far:
gen varxl50 = varx <50 if varx <.
bysort patnum (hour): gen varxl50sum = sum(varxl50)
I'm running into problems because of missing values I think.
Does this code look right?
Is there a mode succinct way to code this?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


Not quite.
The problem is with missing values.
bysort id (hour) : gen mysum = sum(varx < 50)
the function sum(varx<50) reports 0 if varx is missing.
But  if varx is missing for the entirety of the hours in a given id, I'd like mysum = sum(varx<50) to be missing.
If I add if varx<. To the end of the bysort... command, then the sum is missing if the varx is missing in the last hour.
This is a generic issue that I've been thinking wrongly about for years, and need correction!
Original Message
From: [hidden email] [mailto: [hidden email]] On Behalf Of Hoffman, George
Sent: Monday, November 29, 2010 12:26 PM
To: [hidden email]
Subject: st: RE: RE: RE: RE: summarize conditions within subjects in panel data
This works. Thanks!
Original Message
From: [hidden email] [mailto: [hidden email]] On Behalf Of Nick Cox
Sent: Monday, November 29, 2010 12:13 PM
To: ' [hidden email]'
Subject: st: RE: RE: RE: summarize conditions within subjects in panel data
bysort id (hour) : gen mysum = sum(varx < 50)
Nick
[hidden email]
Hoffman, George
id: integers 1,2....200
hour: integers 1,2...48
varx : continuous, 0100 and missing
Nick Cox
We need to know more about how hour is defined and measured. Is it a time since some zero, or a duration? Show a segment of your data for one subject.
Hoffman, George
I've got a panel dataset (xt) uniquely identified by subject (id) and time (hour), sorted by id hour.
I'd like to generate a variable that counts the cumulative (within id, across hour) number of hours that a variable is less than 50.
My code so far:
gen varxl50 = varx <50 if varx <.
bysort patnum (hour): gen varxl50sum = sum(varxl50)
I'm running into problems because of missing values I think.
Does this code look right?
Is there a mode succinct way to code this?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


This came up in a different form a few days ago. See my post on 24 Nov
< http://www.hsph.harvard.edu/cgibin/lwgate/STATALIST/archives/statalist.1011/date/article968.html>
bysort id (hour) : gen mysum = sum(varx < 50)
bysort id (varx) : replace mysum = . if missing(varx[1]) & missing(varx[_N])
Nick
[hidden email]
Hoffman, George
Not quite.
The problem is with missing values.
bysort id (hour) : gen mysum = sum(varx < 50)
the function sum(varx<50) reports 0 if varx is missing.
But  if varx is missing for the entirety of the hours in a given id, I'd like mysum = sum(varx<50) to be missing.
If I add if varx<. To the end of the bysort... command, then the sum is missing if the varx is missing in the last hour.
This is a generic issue that I've been thinking wrongly about for years, and need correction!
Hoffman, George
This works. Thanks!
Nick Cox
bysort id (hour) : gen mysum = sum(varx < 50)
Hoffman, George
id: integers 1,2....200
hour: integers 1,2...48
varx : continuous, 0100 and missing
Nick Cox
We need to know more about how hour is defined and measured. Is it a time since some zero, or a duration? Show a segment of your data for one subject.
Hoffman, George
I've got a panel dataset (xt) uniquely identified by subject (id) and time (hour), sorted by id hour.
I'd like to generate a variable that counts the cumulative (within id, across hour) number of hours that a variable is less than 50.
My code so far:
gen varxl50 = varx <50 if varx <.
bysort patnum (hour): gen varxl50sum = sum(varxl50)
I'm running into problems because of missing values I think.
Does this code look right?
Is there a mode succinct way to code this?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


You could package this, but at root Stata needs to look at _all_ the values for a panel before it can decide that _all_ are missing. Hence I think there isn't a oneline solution, except trivially if you write a program to do it.
Nick
[hidden email]
Original Message
From: [hidden email] [mailto: [hidden email]] On Behalf Of Nick Cox
Sent: 29 November 2010 18:50
To: ' [hidden email]'
Subject: st: RE: RE: RE: RE: RE: RE: summarize conditions within subjects in panel data
This came up in a different form a few days ago. See my post on 24 Nov
< http://www.hsph.harvard.edu/cgibin/lwgate/STATALIST/archives/statalist.1011/date/article968.html>
bysort id (hour) : gen mysum = sum(varx < 50)
bysort id (varx) : replace mysum = . if missing(varx[1]) & missing(varx[_N])
Nick
[hidden email]
Hoffman, George
Not quite.
The problem is with missing values.
bysort id (hour) : gen mysum = sum(varx < 50)
the function sum(varx<50) reports 0 if varx is missing.
But  if varx is missing for the entirety of the hours in a given id, I'd like mysum = sum(varx<50) to be missing.
If I add if varx<. To the end of the bysort... command, then the sum is missing if the varx is missing in the last hour.
This is a generic issue that I've been thinking wrongly about for years, and need correction!
Hoffman, George
This works. Thanks!
Nick Cox
bysort id (hour) : gen mysum = sum(varx < 50)
Hoffman, George
id: integers 1,2....200
hour: integers 1,2...48
varx : continuous, 0100 and missing
Nick Cox
We need to know more about how hour is defined and measured. Is it a time since some zero, or a duration? Show a segment of your data for one subject.
Hoffman, George
I've got a panel dataset (xt) uniquely identified by subject (id) and time (hour), sorted by id hour.
I'd like to generate a variable that counts the cumulative (within id, across hour) number of hours that a variable is less than 50.
My code so far:
gen varxl50 = varx <50 if varx <.
bysort patnum (hour): gen varxl50sum = sum(varxl50)
I'm running into problems because of missing values I think.
Does this code look right?
Is there a mode succinct way to code this?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


On Mon, Nov 29, 2010 at 10:40 AM, Maarten buis < [hidden email]> wrote:
>  On Mon, 29/11/10, Beatrice Crozza wrote:
>> I have a problem with the memory allocation in Stata.
>>
>> My laptop has 3GB RAM. I set the virtual memory between
>> 500m and 3GB. I use WindowsXp.
>>
>> When I give the command to Stata:
>> set memory 800m
>> I receive the error message that there isn't enough memory.
>> Why Stata doesn't use the virtual memory?
>
> < http://www.stata.com/support/faqs/win/winmemory.html>
Given that this file is hosted on the StataCorp's web site, would it
be possible to extend the famous
"Think of Stata's data area as the area of a rectangle" message to
include a reference to the FAQ page
quoted by Neil and Maarten?
If for some reason it is inconvenient to modify the code, then perhaps
the help file for "set mem"?
Thank you, Sergiy
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


Thanks again.
If only by, and bysort, could take a reverse modifier (like gsort id hour)
Original Message
From: [hidden email] [mailto: [hidden email]] On Behalf Of Nick Cox
Sent: Monday, November 29, 2010 1:00 PM
To: ' [hidden email]'
Subject: st: RE: RE: RE: RE: RE: RE: RE: summarize conditions within subjects in panel data
You could package this, but at root Stata needs to look at _all_ the values for a panel before it can decide that _all_ are missing. Hence I think there isn't a oneline solution, except trivially if you write a program to do it.
Nick
[hidden email]
Original Message
From: [hidden email] [mailto: [hidden email]] On Behalf Of Nick Cox
Sent: 29 November 2010 18:50
To: ' [hidden email]'
Subject: st: RE: RE: RE: RE: RE: RE: summarize conditions within subjects in panel data
This came up in a different form a few days ago. See my post on 24 Nov
< http://www.hsph.harvard.edu/cgibin/lwgate/STATALIST/archives/statalist.1011/date/article968.html>
bysort id (hour) : gen mysum = sum(varx < 50)
bysort id (varx) : replace mysum = . if missing(varx[1]) & missing(varx[_N])
Nick
[hidden email]
Hoffman, George
Not quite.
The problem is with missing values.
bysort id (hour) : gen mysum = sum(varx < 50)
the function sum(varx<50) reports 0 if varx is missing.
But  if varx is missing for the entirety of the hours in a given id, I'd like mysum = sum(varx<50) to be missing.
If I add if varx<. To the end of the bysort... command, then the sum is missing if the varx is missing in the last hour.
This is a generic issue that I've been thinking wrongly about for years, and need correction!
Hoffman, George
This works. Thanks!
Nick Cox
bysort id (hour) : gen mysum = sum(varx < 50)
Hoffman, George
id: integers 1,2....200
hour: integers 1,2...48
varx : continuous, 0100 and missing
Nick Cox
We need to know more about how hour is defined and measured. Is it a time since some zero, or a duration? Show a segment of your data for one subject.
Hoffman, George
I've got a panel dataset (xt) uniquely identified by subject (id) and time (hour), sorted by id hour.
I'd like to generate a variable that counts the cumulative (within id, across hour) number of hours that a variable is less than 50.
My code so far:
gen varxl50 = varx <50 if varx <.
bysort patnum (hour): gen varxl50sum = sum(varxl50)
I'm running into problems because of missing values I think.
Does this code look right?
Is there a mode succinct way to code this?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


That is possible as Dimitriy pointed out, but it wouldn't make the problem soluble in one line, as the missings would just be reversed in time.
Nick
[hidden email]
Dimitriy Masterov
=================
I think you can construct a "fake" hour variable easily:
gsort id hour;
bys id: gen hour2=_n;
bysort id hour2: do your thing
Hoffman, George
===============
Thanks again.
If only by, and bysort, could take a reverse modifier (like gsort id hour)
Nick Cox
=========
You could package this, but at root Stata needs to look at _all_ the values for a panel before it can decide that _all_ are missing. Hence I think there isn't a oneline solution, except trivially if you write a program to do it.
Nick Cox
========
This came up in a different form a few days ago. See my post on 24 Nov
< http://www.hsph.harvard.edu/cgibin/lwgate/STATALIST/archives/statalist.1011/date/article968.html>
bysort id (hour) : gen mysum = sum(varx < 50)
bysort id (varx) : replace mysum = . if missing(varx[1]) & missing(varx[_N])
Hoffman, George
===============
Not quite.
The problem is with missing values.
bysort id (hour) : gen mysum = sum(varx < 50)
the function sum(varx<50) reports 0 if varx is missing.
But  if varx is missing for the entirety of the hours in a given id, I'd like mysum = sum(varx<50) to be missing.
If I add if varx<. To the end of the bysort... command, then the sum is missing if the varx is missing in the last hour.
This is a generic issue that I've been thinking wrongly about for years, and need correction!
Hoffman, George
===============
This works. Thanks!
Nick Cox
========
bysort id (hour) : gen mysum = sum(varx < 50)
Hoffman, George
===============
id: integers 1,2....200
hour: integers 1,2...48
varx : continuous, 0100 and missing
Nick Cox
========
We need to know more about how hour is defined and measured. Is it a time since some zero, or a duration? Show a segment of your data for one subject.
Hoffman, George
===============
I've got a panel dataset (xt) uniquely identified by subject (id) and time (hour), sorted by id hour.
I'd like to generate a variable that counts the cumulative (within id, across hour) number of hours that a variable is less than 50.
My code so far:
gen varxl50 = varx <50 if varx <.
bysort patnum (hour): gen varxl50sum = sum(varxl50)
I'm running into problems because of missing values I think.
Does this code look right?
Is there a mode succinct way to code this?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


I've settled on a twoline solution.
In the process, I've discovered that my favorite userwritten command, defv, will take bysort as an option also!
Example:
defv bysort id (hour): varx50sum = sum(varx<50)
defv bysort id (varx) : varx50sum = . if missing(varx[1]) & missing(varx[_N])
STB51 dm50.1 . . . . . . . . . . . . . . . . . . . . . . . . Update to defv
(help defv if installed) . . . . . . . . . . . . . . . J. R. Gleason
9/99 p.2; STB Reprints Vol 9, pp.1415
updated to Stata 6 and improved
Original Message
From: [hidden email] [mailto: [hidden email]] On Behalf Of Nick Cox
Sent: Tuesday, November 30, 2010 4:48 AM
To: ' [hidden email]'
Subject: st: RE: RE: RE: RE: RE: RE: RE: RE: RE: summarize conditions within subjects in panel data
That is possible as Dimitriy pointed out, but it wouldn't make the problem soluble in one line, as the missings would just be reversed in time.
Nick
[hidden email]
Dimitriy Masterov
=================
I think you can construct a "fake" hour variable easily:
gsort id hour;
bys id: gen hour2=_n;
bysort id hour2: do your thing
Hoffman, George
===============
Thanks again.
If only by, and bysort, could take a reverse modifier (like gsort id hour)
Nick Cox
=========
You could package this, but at root Stata needs to look at _all_ the values for a panel before it can decide that _all_ are missing. Hence I think there isn't a oneline solution, except trivially if you write a program to do it.
Nick Cox
========
This came up in a different form a few days ago. See my post on 24 Nov
< http://www.hsph.harvard.edu/cgibin/lwgate/STATALIST/archives/statalist.1011/date/article968.html>
bysort id (hour) : gen mysum = sum(varx < 50)
bysort id (varx) : replace mysum = . if missing(varx[1]) & missing(varx[_N])
Hoffman, George
===============
Not quite.
The problem is with missing values.
bysort id (hour) : gen mysum = sum(varx < 50)
the function sum(varx<50) reports 0 if varx is missing.
But  if varx is missing for the entirety of the hours in a given id, I'd like mysum = sum(varx<50) to be missing.
If I add if varx<. To the end of the bysort... command, then the sum is missing if the varx is missing in the last hour.
This is a generic issue that I've been thinking wrongly about for years, and need correction!
Hoffman, George
===============
This works. Thanks!
Nick Cox
========
bysort id (hour) : gen mysum = sum(varx < 50)
Hoffman, George
===============
id: integers 1,2....200
hour: integers 1,2...48
varx : continuous, 0100 and missing
Nick Cox
========
We need to know more about how hour is defined and measured. Is it a time since some zero, or a duration? Show a segment of your data for one subject.
Hoffman, George
===============
I've got a panel dataset (xt) uniquely identified by subject (id) and time (hour), sorted by id hour.
I'd like to generate a variable that counts the cumulative (within id, across hour) number of hours that a variable is less than 50.
My code so far:
gen varxl50 = varx <50 if varx <.
bysort patnum (hour): gen varxl50sum = sum(varxl50)
I'm running into problems because of missing values I think.
Does this code look right?
Is there a mode succinct way to code this?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/

