
123456

This seems like the sort of thing that characteristics are designed for
 see help char
 Nick Winter
[hidden email] wrote:
> Hi Statalisters,
>
> the following is a "dirty" but  at least for me  useful trick:
>
> I produce a lot of graphs in batch mode. Layout usually needs a lot of
> tweaking (titles, labels, formats).
> So I had to write many "foreach" loops with 6, 7, 8 or more parallel
> lists to specify individual layout parameters.
>
> One example: Some metric var (length) (not really metric but integer in
> auto.dta) and a 0/1var (foreign):
>
> clear
> sysuse auto
> foreach x of var length foreign {
> graph bar `x',over(rep78) blabel(bar)
> sleep 2000
> }
>
> The 0/1mean is better displayed as a proportion. To make it look good,
> I would multiply foreign by 100, label the axis "Percentage", write
> "Percentage" into the title and set the label format to %3.1f.
>
> It would be nice to be able to attach such display information to the
> variable, so one could take these metaparameters from the dataset
> instead of specifying them by hand each time. There seems to be no
> regular way to do so.
> As a workaround for this, labels for extended missing values (.a, .b, .c
> ... .z) came to my mind which can be set for all numerical variables. I
> never use any more than ".a" or ".b" so why not store some information
> in the value label of some (by me) unused missing value like ".l"?
>
> The following code stores some basic information on type of display and
> label format to value label ".l" of variables. Later two graphs are
> produced using this information.
>
>
> *************************************
> *** Create example dataset
> clear
> sysuse auto
> *** Create metadata codes
> foreach var of varlist _all {
> // only for numerical vars
> local type : type `var'
> if inlist("`type'", "byte", "int", "float", "real", "double") == 0
> continue
> local form "m21" // default : display as mean, label format 2.1
>
> *** Example 1: m: display as mean, label format 3.0
> if inlist("`var'", "length") == 1 local form "m30"
>
> *** Example 2: p: display as percentage, format 3.1
> if inlist("`var'", "foreign") == 1 local form "p31"
>
> *** Each var gets a new value label (templbl`var'). Existing value
> labels are copied.
> local lbl`var' : value label `var'
> if "`lbl`var''" == "" {
> cap label drop templbl`var'
> label define templbl`var' .l "`form'"
> label values `var' templbl`var'
> }
> else {
> cap label drop templbl`var'
> label copy `lbl`var'' templbl`var'
> label define templbl`var' .l "`form'" , add
> label values `var' templbl`var'
> }
> }
>
> *** Now set up the graphs:
> local varover "rep78"
> local varlab : variable label `varover'
> foreach var of varlist length foreign {
> local varlab2 : variable label `var'
> local how : label templbl`var' .l // ".l" label content into a local
> gen xvar = `var' // in order not to alter the original var, xvar is
> used in the graph
> if substr("`how'",1,1) == "p" replace xvar = `var' * 100 // multiply
> with 100 if var displays percentage
> if substr("`how'",1,1) == "m" local value = "Mean" // Label for titles
> if substr("`how'",1,1) == "p" local value = "Percentage" // Label for
> titles
> local form = "%" + substr("`how'",2,1) + "." + substr("`how'",3,1) + "f"
> // local for label formatting
> graph bar xvar, over(`varover') title("`varlab2' over `varlab'
> (`value')") ytitle("`value'") blabel(bar,format(`form'))
> sleep 2000
> drop xvar
> }
> ************************************
>
>
> Stefan
>
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search> * http://www.stata.com/support/statalist/faq> * http://www.ats.ucla.edu/stat/stata/

Nicholas Winter 434.924.6994 t
Assistant Professor 434.924.3359 f
Department of Politics [hidden email] e
University of Virginia faculty.virginia.edu/nwinter w
PO Box 400787, 100 Cabell Hall
Charlottesville, VA 22904
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


The Stata philosophy here, as indeed often elsewhere, is that it
provides lowlevel tools so that you can use them as you wish for a
variety of higher level purposes.
In particular, Stata clearly has no concept of a variable that should be
displayed in percent terms. That's entirely a user preference.
To me, the natural lowlevel tool here for recording such a preference
is that of characteristics. What you did worked for you, but I'd rather
reserve missing value support for missing values. Positively, using
characteristics is an example of attaching information to the variable,
exactly as you wish.
You could define characteristics like this
char foreign[showpc] "percent"
and then in a loop condition on such a characteristic being found
e.g.
foreach v of var <varlist> {
if "`: char `v'[showpc]'" == "percent" {
<whatever>
}
else {
<whatever else>
}
}
Note that it is not an error to refer to a nonexistent characteristic.
It is just treated as if it were an empty string. So, I don't need to
define this characteristic when I don't need it.
Nick
[hidden email]
[hidden email]
the following is a "dirty" but  at least for me  useful trick:
I produce a lot of graphs in batch mode. Layout usually needs a lot of
tweaking (titles, labels, formats).
So I had to write many "foreach" loops with 6, 7, 8 or more parallel
lists to specify individual layout parameters.
One example: Some metric var (length) (not really metric but integer in
auto.dta) and a 0/1var (foreign):
clear
sysuse auto
foreach x of var length foreign {
graph bar `x',over(rep78) blabel(bar)
sleep 2000
}
The 0/1mean is better displayed as a proportion. To make it look good,
I would multiply foreign by 100, label the axis "Percentage", write
"Percentage" into the title and set the label format to %3.1f.
It would be nice to be able to attach such display information to the
variable, so one could take these metaparameters from the dataset
instead of specifying them by hand each time. There seems to be no
regular way to do so.
As a workaround for this, labels for extended missing values (.a, .b, .c
... .z) came to my mind which can be set for all numerical variables. I
never use any more than ".a" or ".b" so why not store some information
in the value label of some (by me) unused missing value like ".l"?
The following code stores some basic information on type of display and
label format to value label ".l" of variables. Later two graphs are
produced using this information.
*************************************
*** Create example dataset
clear
sysuse auto
*** Create metadata codes
foreach var of varlist _all {
// only for numerical vars
local type : type `var'
if inlist("`type'", "byte", "int", "float", "real", "double") == 0
continue
local form "m21" // default : display as mean, label format 2.1
*** Example 1: m: display as mean, label format 3.0
if inlist("`var'", "length") == 1 local form "m30"
*** Example 2: p: display as percentage, format 3.1
if inlist("`var'", "foreign") == 1 local form "p31"
*** Each var gets a new value label (templbl`var'). Existing value
labels are copied.
local lbl`var' : value label `var'
if "`lbl`var''" == "" {
cap label drop templbl`var'
label define templbl`var' .l "`form'"
label values `var' templbl`var'
}
else {
cap label drop templbl`var'
label copy `lbl`var'' templbl`var'
label define templbl`var' .l "`form'" , add
label values `var' templbl`var'
}
}
*** Now set up the graphs:
local varover "rep78"
local varlab : variable label `varover'
foreach var of varlist length foreign {
local varlab2 : variable label `var'
local how : label templbl`var' .l // ".l" label content into a local
gen xvar = `var' // in order not to alter the original var, xvar is
used in the graph
if substr("`how'",1,1) == "p" replace xvar = `var' * 100 // multiply
with 100 if var displays percentage
if substr("`how'",1,1) == "m" local value = "Mean" // Label for titles
if substr("`how'",1,1) == "p" local value = "Percentage" // Label for
titles
local form = "%" + substr("`how'",2,1) + "." + substr("`how'",3,1) + "f"
// local for label formatting
graph bar xvar, over(`varover') title("`varlab2' over `varlab'
(`value')") ytitle("`value'") blabel(bar,format(`form'))
sleep 2000
drop xvar
}
************************************
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


 On Mon, 16/11/09, [hidden email] wrote:
> the following is a "dirty" but  at least for me  useful
> trick:
>
> I produce a lot of graphs in batch mode. Layout usually
> needs a lot of tweaking (titles, labels, formats).
<snip>
> It would be nice to be able to attach such display
> information to the variable, so one could take these
> metaparameters from the dataset instead of specifying
> them by hand each time. There seems to be no regular way
> to do so.
> As a workaround for this, labels for extended missing
> values (.a, .b, .c ... .z) came to my mind which can be
> set for all numerical variables. I never use any more
> than ".a" or ".b" so why not store some information
> in the value label of some (by me) unused missing value
> like ".l"?
<snip>
You can make your trick less "dirty" by storing this
metainformation in what is called in Stata
characteristics, see : help char. This allows
you to attach this kind of information to a variable
without running the risk that you get weird results
when at some future date you do happen to use the ".l"
extended missing value. You can extract that
information using char extended macro function, see:
help extended_fcn
Hope this helps,
Maarten

Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany
http://www.maartenbuis.nl
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


[hidden email] wrote:
> Hi Statalisters,
>
> the following is a "dirty" but  at least for me  useful trick:
>
Several have suggested using characteristics to store this kind of
info, and I concur. The main disadvantage of both schemes
is that it can be difficult to figure out which variables have
which characteristics  I wrote the program below to list variables
that either have a certain characteristic defined, or have a
certain value for a given characteristic. Ie,
dchar, char(graph percent)
will list all variables with characteristic 'graph' defined
and equal to 'percent'; or
dchar, char(graph)
will simply list all variables with characteristics `graph'
defined. The program also takes a varlist, to limit the search,
and stores the hit list in r(varlist).
hth,
Jeph
*****************************************
* DCHAR
* program to get list of variables with
* certain characteristics
*!
*! version 1.0 27 Aug 2009
*****************************************
program define dchar, rclass
syntax [varlist] , CHar(string)
local char1 : word 1 of `char'
local char2 : word 2 of `char'
local rlist ""
local numvars 0
di ""
foreach V of varlist `varlist' {
local charlist : char `V'[]
local found : list char1 in charlist
if `found'>0 {
if "`char2'"!="" {
local charval : char `V'[`char1']
if "`charval'"=="`char2'" {
di in y "`V'" _col(40) in g ///
"`char1'" _col(50) "`char2'"
local rlist "`rlist'`V' "
local numvars = `numvars'+1
}
}
else {
local charval : char `V'[`char1']
di in y "`V'" _col(40) in g ///
"`char1'" _col(50) "`charval'"
local rlist "`rlist'`V' "
local numvars = `numvars'+1
}
}
}
return local varlist "`rlist'"
return scalar numvars= `numvars'
end
*****************************************
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


Note that the official Stata command ds already has options has()
and not() that permit identification of variables with or without
named characteristics (but not their contents, as here).
There is much scope for different programming styles, but flagging that
a variable has a particular property can be done by assigning any
nonempty text. Putting informative detail _inside_ characteristics has
disadvantages as well as advantages, as the programmer is
correspondingly obliged to, or able to, look inside.
Nick
[hidden email]
Jeph Herrin
[hidden email] wrote:
> Hi Statalisters,
>
> the following is a "dirty" but  at least for me  useful trick:
>
Several have suggested using characteristics to store this kind of
info, and I concur. The main disadvantage of both schemes
is that it can be difficult to figure out which variables have
which characteristics  I wrote the program below to list variables
that either have a certain characteristic defined, or have a
certain value for a given characteristic. Ie,
dchar, char(graph percent)
will list all variables with characteristic 'graph' defined
and equal to 'percent'; or
dchar, char(graph)
will simply list all variables with characteristics `graph'
defined. The program also takes a varlist, to limit the search,
and stores the hit list in r(varlist).
hth,
Jeph
*****************************************
* DCHAR
* program to get list of variables with
* certain characteristics
*!
*! version 1.0 27 Aug 2009
*****************************************
program define dchar, rclass
syntax [varlist] , CHar(string)
local char1 : word 1 of `char'
local char2 : word 2 of `char'
local rlist ""
local numvars 0
di ""
foreach V of varlist `varlist' {
local charlist : char `V'[]
local found : list char1 in charlist
if `found'>0 {
if "`char2'"!="" {
local charval : char `V'[`char1']
if "`charval'"=="`char2'" {
di in y "`V'" _col(40) in g ///
"`char1'" _col(50) "`char2'"
local rlist "`rlist'`V' "
local numvars = `numvars'+1
}
}
else {
local charval : char `V'[`char1']
di in y "`V'" _col(40) in g ///
"`char1'" _col(50) "`charval'"
local rlist "`rlist'`V' "
local numvars = `numvars'+1
}
}
}
return local varlist "`rlist'"
return scalar numvars= `numvars'
end
*****************************************
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


Yes, originally I wrote this as a wrapper for ds, but found I
still had to loop over the return list to check for assigned values
so bagged ds altogether in the end.
That said, obviously, if you don't care what the value of a
characteristic is, there is no reason to assign such values.
I have datasets that consist of dozens of linked tables containing
thousands of variables. The more information I can attach to a
variable the first time I look at it, the more I can automate
my analyses and output, which in such a situation is highly
desirable.
cheers,
J
Nick Cox wrote:
> Note that the official Stata command ds already has options has()
> and not() that permit identification of variables with or without
> named characteristics (but not their contents, as here).
>
> There is much scope for different programming styles, but flagging that
> a variable has a particular property can be done by assigning any
> nonempty text. Putting informative detail _inside_ characteristics has
> disadvantages as well as advantages, as the programmer is
> correspondingly obliged to, or able to, look inside.
>
> Nick
> [hidden email]
>
> Jeph Herrin
>
> [hidden email] wrote:
>> Hi Statalisters,
>>
>> the following is a "dirty" but  at least for me  useful trick:
>>
>
> Several have suggested using characteristics to store this kind of
> info, and I concur. The main disadvantage of both schemes
> is that it can be difficult to figure out which variables have
> which characteristics  I wrote the program below to list variables
> that either have a certain characteristic defined, or have a
> certain value for a given characteristic. Ie,
>
> dchar, char(graph percent)
>
> will list all variables with characteristic 'graph' defined
> and equal to 'percent'; or
>
> dchar, char(graph)
>
> will simply list all variables with characteristics `graph'
> defined. The program also takes a varlist, to limit the search,
> and stores the hit list in r(varlist).
>
> hth,
> Jeph
>
>
>
> *****************************************
> * DCHAR
> * program to get list of variables with
> * certain characteristics
> *!
> *! version 1.0 27 Aug 2009
> *****************************************
>
> program define dchar, rclass
> syntax [varlist] , CHar(string)
> local char1 : word 1 of `char'
> local char2 : word 2 of `char'
> local rlist ""
> local numvars 0
> di ""
> foreach V of varlist `varlist' {
> local charlist : char `V'[]
> local found : list char1 in charlist
> if `found'>0 {
> if "`char2'"!="" {
> local charval : char `V'[`char1']
> if "`charval'"=="`char2'" {
> di in y "`V'" _col(40) in g ///
> "`char1'" _col(50) "`char2'"
> local rlist "`rlist'`V' "
> local numvars = `numvars'+1
> }
> }
> else {
> local charval : char `V'[`char1']
> di in y "`V'" _col(40) in g ///
> "`char1'" _col(50) "`charval'"
> local rlist "`rlist'`V' "
> local numvars = `numvars'+1
> }
> }
> }
> return local varlist "`rlist'"
> return scalar numvars= `numvars'
> end
> *****************************************
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search> * http://www.stata.com/support/statalist/faq> * http://www.ats.ucla.edu/stat/stata/>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search> * http://www.stata.com/support/statalist/faq> * http://www.ats.ucla.edu/stat/stata/>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


We agree.
Nick
[hidden email]
Jeph Herrin
Yes, originally I wrote this as a wrapper for ds, but found I
still had to loop over the return list to check for assigned values
so bagged ds altogether in the end.
That said, obviously, if you don't care what the value of a
characteristic is, there is no reason to assign such values.
I have datasets that consist of dozens of linked tables containing
thousands of variables. The more information I can attach to a
variable the first time I look at it, the more I can automate
my analyses and output, which in such a situation is highly
desirable.
Nick Cox wrote:
> Note that the official Stata command ds already has options has()
> and not() that permit identification of variables with or without
> named characteristics (but not their contents, as here).
>
> There is much scope for different programming styles, but flagging
that
> a variable has a particular property can be done by assigning any
> nonempty text. Putting informative detail _inside_ characteristics
has
> disadvantages as well as advantages, as the programmer is
> correspondingly obliged to, or able to, look inside.
Jeph Herrin
> [hidden email] wrote:
>> the following is a "dirty" but  at least for me  useful trick:
>>
>
> Several have suggested using characteristics to store this kind of
> info, and I concur. The main disadvantage of both schemes
> is that it can be difficult to figure out which variables have
> which characteristics  I wrote the program below to list variables
> that either have a certain characteristic defined, or have a
> certain value for a given characteristic. Ie,
>
> dchar, char(graph percent)
>
> will list all variables with characteristic 'graph' defined
> and equal to 'percent'; or
>
> dchar, char(graph)
>
> will simply list all variables with characteristics `graph'
> defined. The program also takes a varlist, to limit the search,
> and stores the hit list in r(varlist).
>
> hth,
> Jeph
>
>
>
> *****************************************
> * DCHAR
> * program to get list of variables with
> * certain characteristics
> *!
> *! version 1.0 27 Aug 2009
> *****************************************
>
> program define dchar, rclass
> syntax [varlist] , CHar(string)
> local char1 : word 1 of `char'
> local char2 : word 2 of `char'
> local rlist ""
> local numvars 0
> di ""
> foreach V of varlist `varlist' {
> local charlist : char `V'[]
> local found : list char1 in charlist
> if `found'>0 {
> if "`char2'"!="" {
> local charval : char `V'[`char1']
> if "`charval'"=="`char2'" {
> di in y "`V'" _col(40) in g
///
> "`char1'" _col(50) "`char2'"
> local rlist "`rlist'`V' "
> local numvars = `numvars'+1
> }
> }
> else {
> local charval : char `V'[`char1']
> di in y "`V'" _col(40) in g ///
> "`char1'" _col(50) "`charval'"
> local rlist "`rlist'`V' "
> local numvars = `numvars'+1
> }
> }
> }
> return local varlist "`rlist'"
> return scalar numvars= `numvars'
> end
> *****************************************
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


Dear Statalisters,
Is there any univariate table command available that displays a bottom to top or reverse cumulative distribution?
One example: I have data on the number of vaccine doses given (0..9) and want to know the percentage of cases having at least 4 or 3 or 2 doses.
All I found in Statalist was a thread from 2003 discussing some workarounds.
Stefan Gawrich
Hesse State Health Office (HLPUG)
Dillenburg, Germany
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


 Stefan Gawrich wrote:
> Is there any univariate table command available that displays a
> bottom to top or reverse cumulative distribution? One example:
> I have data on the number of vaccine doses given (0..9) and want
> to know the percentage of cases having at least 4 or 3 or 2 doses.
I don't know of such a program, but it is easy enough to create it.
Below is such a program reversecum. It gives for each value the
percentage of observations that have that value or more (excluding
missing values), it allows for fweights and if and in
conditioning.
This program will be normally available on your computer like any
other Stata command, if you copy the line starting with
"*! version..." and ending with "end" (inclusive) into a file
and call it reversecum.ado and save it in your personal ado folder
(type in Stata adopath to find out where that is).
Alternatively, you can put these lines at the top of your dofile,
and this program will than be available while running that dofile,
like in the example below.
* begin example 
program drop _all
*! version 1.0.0 MLB 20Jan2010
program reversecum
syntax varname [if] [in] [fweight]
marksample touse
tempvar _freq cum r_cum
if "`weight'" != "" {
local wgt "[`weight'`exp']"
}
preserve
contract `varlist' if `touse' `wgt', ///
freq(`_freq') cpercent(`cum') nomiss
qui gen double `r_cum' = 100  `cum'[_n1]
qui replace `r_cum' = 100 in 1
format `r_cum' %8.2f
label var `r_cum' "reverse Cum"
tabdisp `varlist', cell(`_freq' `r_cum')
restore
end
sysuse auto, clear
reversecum rep78
* end example 
( For more on how to use examples I sent to statalist see:
http://www.maartenbuis.nl/stata/exampleFAQ.html )
Hope this helps,
Maarten

Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany
http://www.maartenbuis.nl
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


<>
Similarly:
*************
sysuse auto, clear
cumul rep78, generate(cumrep78) equal
gen reversecumrep78=1cumrep78
bys rep78: egen freq=count(rep78)
bys rep78: keep if _n==1
l rep78 freq reversecumrep78
*************
HTH
Martin
UrsprÃ¼ngliche Nachricht
Von: [hidden email] [mailto: [hidden email]] Im Auftrag von Maarten buis
Gesendet: Mittwoch, 20. Januar 2010 10:49
An: [hidden email]
Betreff: Re: st: bottom to top or reverse cumulative distribution in table command?
 Stefan Gawrich wrote:
> Is there any univariate table command available that displays a
> bottom to top or reverse cumulative distribution? One example:
> I have data on the number of vaccine doses given (0..9) and want
> to know the percentage of cases having at least 4 or 3 or 2 doses.
I don't know of such a program, but it is easy enough to create it.
Below is such a program reversecum. It gives for each value the
percentage of observations that have that value or more (excluding
missing values), it allows for fweights and if and in
conditioning.
This program will be normally available on your computer like any
other Stata command, if you copy the line starting with
"*! version..." and ending with "end" (inclusive) into a file
and call it reversecum.ado and save it in your personal ado folder
(type in Stata adopath to find out where that is).
Alternatively, you can put these lines at the top of your dofile,
and this program will than be available while running that dofile,
like in the example below.
* begin example 
program drop _all
*! version 1.0.0 MLB 20Jan2010
program reversecum
syntax varname [if] [in] [fweight]
marksample touse
tempvar _freq cum r_cum
if "`weight'" != "" {
local wgt "[`weight'`exp']"
}
preserve
contract `varlist' if `touse' `wgt', ///
freq(`_freq') cpercent(`cum') nomiss
qui gen double `r_cum' = 100  `cum'[_n1]
qui replace `r_cum' = 100 in 1
format `r_cum' %8.2f
label var `r_cum' "reverse Cum"
tabdisp `varlist', cell(`_freq' `r_cum')
restore
end
sysuse auto, clear
reversecum rep78
* end example 
( For more on how to use examples I sent to statalist see:
http://www.maartenbuis.nl/stata/exampleFAQ.html )
Hope this helps,
Maarten

Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany
http://www.maartenbuis.nl
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


Maarten and Martin correctly pointed out that you can recreate such tabulations for yourself from first principles.
However, a more elaborate canned alternative is available. See groups from SSC. There was some discussion in
SJ34 pr0011 . . . . . . . . Speaking Stata: Problems with tables, Part II
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
Q4/03 SJ 3(4):420439 (no commands)
reviews three userwritten commands (tabcount, makematrix,
and groups) as different approaches to tabulation problems
which is available online at the Stata Journal website.
Nick
[hidden email]
Martin Weiss
Similarly:
sysuse auto, clear
cumul rep78, generate(cumrep78) equal
gen reversecumrep78=1cumrep78
bys rep78: egen freq=count(rep78)
bys rep78: keep if _n==1
l rep78 freq reversecumrep78
Maarten buis
 Stefan Gawrich wrote:
> Is there any univariate table command available that displays a
> bottom to top or reverse cumulative distribution? One example:
> I have data on the number of vaccine doses given (0..9) and want
> to know the percentage of cases having at least 4 or 3 or 2 doses.
I don't know of such a program, but it is easy enough to create it.
Below is such a program reversecum. It gives for each value the
percentage of observations that have that value or more (excluding
missing values), it allows for fweights and if and in
conditioning.
This program will be normally available on your computer like any
other Stata command, if you copy the line starting with
"*! version..." and ending with "end" (inclusive) into a file
and call it reversecum.ado and save it in your personal ado folder
(type in Stata adopath to find out where that is).
Alternatively, you can put these lines at the top of your dofile,
and this program will than be available while running that dofile,
like in the example below.
* begin example 
program drop _all
*! version 1.0.0 MLB 20Jan2010
program reversecum
syntax varname [if] [in] [fweight]
marksample touse
tempvar _freq cum r_cum
if "`weight'" != "" {
local wgt "[`weight'`exp']"
}
preserve
contract `varlist' if `touse' `wgt', ///
freq(`_freq') cpercent(`cum') nomiss
qui gen double `r_cum' = 100  `cum'[_n1]
qui replace `r_cum' = 100 in 1
format `r_cum' %8.2f
label var `r_cum' "reverse Cum"
tabdisp `varlist', cell(`_freq' `r_cum')
restore
end
sysuse auto, clear
reversecum rep78
* end example 
( For more on how to use examples I sent to statalist see:
http://www.maartenbuis.nl/stata/exampleFAQ.html )
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


Another elementary trick is to create a negated version of the variable and tabulate in terms of that, removing minus signs from the table via an edit. In its simplest form that requires values to be strictly positive. And any value labels would need to be edited too.
clonevar negx = x
replace negx = negx
tab negx
Nick
[hidden email]
Nick Cox
Maarten and Martin correctly pointed out that you can recreate such tabulations for yourself from first principles.
However, a more elaborate canned alternative is available. See groups from SSC. There was some discussion in
SJ34 pr0011 . . . . . . . . Speaking Stata: Problems with tables, Part II
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
Q4/03 SJ 3(4):420439 (no commands)
reviews three userwritten commands (tabcount, makematrix,
and groups) as different approaches to tabulation problems
which is available online at the Stata Journal website.
Nick
[hidden email]
Martin Weiss
Similarly:
sysuse auto, clear
cumul rep78, generate(cumrep78) equal
gen reversecumrep78=1cumrep78
bys rep78: egen freq=count(rep78)
bys rep78: keep if _n==1
l rep78 freq reversecumrep78
Maarten buis
 Stefan Gawrich wrote:
> Is there any univariate table command available that displays a
> bottom to top or reverse cumulative distribution? One example:
> I have data on the number of vaccine doses given (0..9) and want
> to know the percentage of cases having at least 4 or 3 or 2 doses.
I don't know of such a program, but it is easy enough to create it.
Below is such a program reversecum. It gives for each value the
percentage of observations that have that value or more (excluding
missing values), it allows for fweights and if and in
conditioning.
This program will be normally available on your computer like any
other Stata command, if you copy the line starting with
"*! version..." and ending with "end" (inclusive) into a file
and call it reversecum.ado and save it in your personal ado folder
(type in Stata adopath to find out where that is).
Alternatively, you can put these lines at the top of your dofile,
and this program will than be available while running that dofile,
like in the example below.
* begin example 
program drop _all
*! version 1.0.0 MLB 20Jan2010
program reversecum
syntax varname [if] [in] [fweight]
marksample touse
tempvar _freq cum r_cum
if "`weight'" != "" {
local wgt "[`weight'`exp']"
}
preserve
contract `varlist' if `touse' `wgt', ///
freq(`_freq') cpercent(`cum') nomiss
qui gen double `r_cum' = 100  `cum'[_n1]
qui replace `r_cum' = 100 in 1
format `r_cum' %8.2f
label var `r_cum' "reverse Cum"
tabdisp `varlist', cell(`_freq' `r_cum')
restore
end
sysuse auto, clear
reversecum rep78
* end example 
( For more on how to use examples I sent to statalist see:
http://www.maartenbuis.nl/stata/exampleFAQ.html )
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


Thanks to Martin, Maarten and Nick for your help!
There are many ways to calculate reverse cumulative distributions in a do file and I already have used that. What I was looking for was a quick and easy solution. I often have people on the phone asking for results and it occurred to me that I couldn't produce this kind of data on the fly (calculation in the head is one option but may be errorprone while talking on the phone).
 "cumul, equal" works, but requires some additional programming.
 "groups, show(rvpercent)" is very flexible but  as far as I see  not exactly what I wanted. Options like RVpercent, rpercent or Rpercent produce reverse cumulative distributions of the type "percentage of cases bigger than x" , not "bigger than or equal to x". It works but for "equal or bigger" one has to pick the result from the previous line.
 reversecum works fine. It's a bit slow (compared to the tabulatecommand) with big datasets because of the use of preserve, contract and restore, but produces what I was looking for.
Thanks
Stefan
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


I'll look at groups to see if this functionality can be added.
Nick
[hidden email]
[hidden email]
Thanks to Martin, Maarten and Nick for your help!
There are many ways to calculate reverse cumulative distributions in a do file and I already have used that. What I was looking for was a quick and easy solution. I often have people on the phone asking for results and it occurred to me that I couldn't produce this kind of data on the fly (calculation in the head is one option but may be errorprone while talking on the phone).
 "cumul, equal" works, but requires some additional programming.
 "groups, show(rvpercent)" is very flexible but  as far as I see  not exactly what I wanted. Options like RVpercent, rpercent or Rpercent produce reverse cumulative distributions of the type "percentage of cases bigger than x" , not "bigger than or equal to x". It works but for "equal or bigger" one has to pick the result from the previous line.
 reversecum works fine. It's a bit slow (compared to the tabulatecommand) with big datasets because of the use of preserve, contract and restore, but produces what I was looking for.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


Thanks to Kit Baum, a revised version of groups with an extra ge option has been placed on SSC. I believe that option suits Stefan's problem.
Stata 8 is required.
For an overview without installation:
. ssc type groups.hlp
For installation or reinstallation, use ssc or adoupdate.
As someone, some day, may want cumulatives to be "less than" rather than "less than or equal to" I will shortly add an lt option.
Past Fortran (or FORTRAN) users may enjoy a brief moment of nostalgia.
Nick
[hidden email]
Nick Cox
I'll look at groups to see if this functionality can be added.
[hidden email]
Thanks to Martin, Maarten and Nick for your help!
There are many ways to calculate reverse cumulative distributions in a do file and I already have used that. What I was looking for was a quick and easy solution. I often have people on the phone asking for results and it occurred to me that I couldn't produce this kind of data on the fly (calculation in the head is one option but may be errorprone while talking on the phone).
 "cumul, equal" works, but requires some additional programming.
 "groups, show(rvpercent)" is very flexible but  as far as I see  not exactly what I wanted. Options like RVpercent, rpercent or Rpercent produce reverse cumulative distributions of the type "percentage of cases bigger than x" , not "bigger than or equal to x". It works but for "equal or bigger" one has to pick the result from the previous line.
 reversecum works fine. It's a bit slow (compared to the tabulatecommand) with big datasets because of the use of preserve, contract and restore, but produces what I was looking for.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


Thanks Nick for the "ge" addition to 'groups', works great.
Maarten Buis wrote:
tabulate is a builtin command (i.e. it is part of the
executable, not written in ado or mata), so it is unlikely
that any user written command can improve on that.
That's right and I didn't expect that.
What I meant to write is that calculations can take significant time in big datasets.
Performance example (Dual Core PC, 2 Gig Ram, Stata 11 MP):
1. One variable in a dataset
set rmsg on
set obs 2000000
gen x = int(uniform() * 10)
tab x // 0.44 sec.
groups x,show(freq rpercent) ge // 12.8 sec.
reversecum x // 5.0 sec.
2. 20 variables in a dataset
set obs 2000000
forval x = 1/20 {
gen x`x' = int(uniform() * 10)
}
tab x1 // 0,44 sec.
groups x1,show(freq rpercent) ge // 14.2 sec.
reversecum x1 // 29.2 sec.
BTW: The first time (0.44 sec.) reminds me of my first Stata experience a decade ago.
I worked with SPSS at university before and couldn't believe how fast Stata was.
(instant tabulate results...wow)
Thanks again
Stefan
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


Thanks again to Kit, a version with the extra lt option is now on SSC.
Please install or reinstall using ssc or adoupdate if interested.
The nub of the matter is best shown by an example:
. sysuse auto
. groups rep78, show(F f Rf) lt
++
 rep78 # < Freq. # > 

 1 0 2 67 
 2 2 8 59 
 3 10 30 29 
 4 40 18 11 
 5 58 11 0 
++
. groups rep78, show(F f Rf) ge
++
 rep78 # <= Freq. # >= 

 1 2 2 69 
 2 10 8 67 
 3 40 30 59 
 4 58 18 29 
 5 69 11 11 
++
Nick
[hidden email]
Nick Cox
Thanks to Kit Baum, a revised version of groups with an extra ge option has been placed on SSC. I believe that option suits Stefan's problem.
Stata 8 is required.
For an overview without installation:
. ssc type groups.hlp
For installation or reinstallation, use ssc or adoupdate.
As someone, some day, may want cumulatives to be "less than" rather than "less than or equal to" I will shortly add an lt option.
Past Fortran (or FORTRAN) users may enjoy a brief moment of nostalgia.
Nick
[hidden email]
Nick Cox
I'll look at groups to see if this functionality can be added.
[hidden email]
Thanks to Martin, Maarten and Nick for your help!
There are many ways to calculate reverse cumulative distributions in a do file and I already have used that. What I was looking for was a quick and easy solution. I often have people on the phone asking for results and it occurred to me that I couldn't produce this kind of data on the fly (calculation in the head is one option but may be errorprone while talking on the phone).
 "cumul, equal" works, but requires some additional programming.
 "groups, show(rvpercent)" is very flexible but  as far as I see  not exactly what I wanted. Options like RVpercent, rpercent or Rpercent produce reverse cumulative distributions of the type "percentage of cases bigger than x" , not "bigger than or equal to x". It works but for "equal or bigger" one has to pick the result from the previous line.
 reversecum works fine. It's a bit slow (compared to the tabulatecommand) with big datasets because of the use of preserve, contract and restore, but produces what I was looking for.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


Marcello Pagano wrote:
New subscribers will have to go through a screening process that will add time, hopefully less than a day, before they can subscribe. The screening process is designed to keep some, hopefully very few, people out.
I'm a frequent reader of the Statalist archive but an unfrequent writer. I can't handle all this mails on my work account so I only subscribe temporarily to ask or contribute something.
I don't want to cause any trouble or work with this behaviour. It would be nice to have quick subscription for people already known to majordomo if possible.
Stefan Gawrich
Dillenburg
Germany
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/

123456
