This seems like the sort of thing that characteristics are designed for
-- see -help char- - Nick Winter [hidden email] wrote: > Hi Statalisters, > > the following is a "dirty" but - at least for me - useful trick: > > I produce a lot of graphs in batch mode. Layout usually needs a lot of > tweaking (titles, labels, formats). > So I had to write many "foreach" loops with 6, 7, 8 or more parallel > lists to specify individual layout parameters. > > One example: Some metric var (length) (not really metric but integer in > auto.dta) and a 0/1-var (foreign): > > clear > sysuse auto > foreach x of var length foreign { > graph bar `x',over(rep78) blabel(bar) > sleep 2000 > } > > The 0/1-mean is better displayed as a proportion. To make it look good, > I would multiply foreign by 100, label the axis "Percentage", write > "Percentage" into the title and set the label format to %3.1f. > > It would be nice to be able to attach such display information to the > variable, so one could take these meta-parameters from the dataset > instead of specifying them by hand each time. There seems to be no > regular way to do so. > As a workaround for this, labels for extended missing values (.a, .b, .c > ... .z) came to my mind which can be set for all numerical variables. I > never use any more than ".a" or ".b" so why not store some information > in the value label of some (by me) unused missing value like ".l"? > > The following code stores some basic information on type of display and > label format to value label ".l" of variables. Later two graphs are > produced using this information. > > > ************************************* > *** Create example dataset > clear > sysuse auto > *** Create metadata codes > foreach var of varlist _all { > // only for numerical vars > local type : type `var' > if inlist("`type'", "byte", "int", "float", "real", "double") == 0 > continue > local form "m21" // default : display as mean, label format 2.1 > > *** Example 1: m: display as mean, label format 3.0 > if inlist("`var'", "length") == 1 local form "m30" > > *** Example 2: p: display as percentage, format 3.1 > if inlist("`var'", "foreign") == 1 local form "p31" > > *** Each var gets a new value label (templbl`var'). Existing value > labels are copied. > local lbl`var' : value label `var' > if "`lbl`var''" == "" { > cap label drop templbl`var' > label define templbl`var' .l "`form'" > label values `var' templbl`var' > } > else { > cap label drop templbl`var' > label copy `lbl`var'' templbl`var' > label define templbl`var' .l "`form'" , add > label values `var' templbl`var' > } > } > > *** Now set up the graphs: > local varover "rep78" > local varlab : variable label `varover' > foreach var of varlist length foreign { > local varlab2 : variable label `var' > local how : label templbl`var' .l // ".l" label content into a local > gen xvar = `var' // in order not to alter the original var, xvar is > used in the graph > if substr("`how'",1,1) == "p" replace xvar = `var' * 100 // multiply > with 100 if var displays percentage > if substr("`how'",1,1) == "m" local value = "Mean" // Label for titles > if substr("`how'",1,1) == "p" local value = "Percentage" // Label for > titles > local form = "%" + substr("`how'",2,1) + "." + substr("`how'",3,1) + "f" > // local for label formatting > graph bar xvar, over(`varover') title("`varlab2' over `varlab' > (`value')") ytitle("`value'") blabel(bar,format(`form')) > sleep 2000 > drop xvar > } > ************************************ > > > Stefan > > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ -- -------------------------------------------------------------- Nicholas Winter 434.924.6994 t Assistant Professor 434.924.3359 f Department of Politics [hidden email] e University of Virginia faculty.virginia.edu/nwinter w PO Box 400787, 100 Cabell Hall Charlottesville, VA 22904 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
In reply to this post by Stefan.Gawrich
The Stata philosophy here, as indeed often elsewhere, is that it
provides low-level tools so that you can use them as you wish for a variety of higher level purposes. In particular, Stata clearly has no concept of a variable that should be displayed in percent terms. That's entirely a user preference. To me, the natural low-level tool here for recording such a preference is that of characteristics. What you did worked for you, but I'd rather reserve missing value support for missing values. Positively, using characteristics is an example of attaching information to the variable, exactly as you wish. You could define characteristics like this char foreign[showpc] "percent" and then in a loop condition on such a characteristic being found e.g. foreach v of var <varlist> { if "`: char `v'[showpc]'" == "percent" { <whatever> } else { <whatever else> } } Note that it is not an error to refer to a non-existent characteristic. It is just treated as if it were an empty string. So, I don't need to define this characteristic when I don't need it. Nick [hidden email] [hidden email] the following is a "dirty" but - at least for me - useful trick: I produce a lot of graphs in batch mode. Layout usually needs a lot of tweaking (titles, labels, formats). So I had to write many "foreach" loops with 6, 7, 8 or more parallel lists to specify individual layout parameters. One example: Some metric var (length) (not really metric but integer in auto.dta) and a 0/1-var (foreign): clear sysuse auto foreach x of var length foreign { graph bar `x',over(rep78) blabel(bar) sleep 2000 } The 0/1-mean is better displayed as a proportion. To make it look good, I would multiply foreign by 100, label the axis "Percentage", write "Percentage" into the title and set the label format to %3.1f. It would be nice to be able to attach such display information to the variable, so one could take these meta-parameters from the dataset instead of specifying them by hand each time. There seems to be no regular way to do so. As a workaround for this, labels for extended missing values (.a, .b, .c ... .z) came to my mind which can be set for all numerical variables. I never use any more than ".a" or ".b" so why not store some information in the value label of some (by me) unused missing value like ".l"? The following code stores some basic information on type of display and label format to value label ".l" of variables. Later two graphs are produced using this information. ************************************* *** Create example dataset clear sysuse auto *** Create metadata codes foreach var of varlist _all { // only for numerical vars local type : type `var' if inlist("`type'", "byte", "int", "float", "real", "double") == 0 continue local form "m21" // default : display as mean, label format 2.1 *** Example 1: m: display as mean, label format 3.0 if inlist("`var'", "length") == 1 local form "m30" *** Example 2: p: display as percentage, format 3.1 if inlist("`var'", "foreign") == 1 local form "p31" *** Each var gets a new value label (templbl`var'). Existing value labels are copied. local lbl`var' : value label `var' if "`lbl`var''" == "" { cap label drop templbl`var' label define templbl`var' .l "`form'" label values `var' templbl`var' } else { cap label drop templbl`var' label copy `lbl`var'' templbl`var' label define templbl`var' .l "`form'" , add label values `var' templbl`var' } } *** Now set up the graphs: local varover "rep78" local varlab : variable label `varover' foreach var of varlist length foreign { local varlab2 : variable label `var' local how : label templbl`var' .l // ".l" label content into a local gen xvar = `var' // in order not to alter the original var, xvar is used in the graph if substr("`how'",1,1) == "p" replace xvar = `var' * 100 // multiply with 100 if var displays percentage if substr("`how'",1,1) == "m" local value = "Mean" // Label for titles if substr("`how'",1,1) == "p" local value = "Percentage" // Label for titles local form = "%" + substr("`how'",2,1) + "." + substr("`how'",3,1) + "f" // local for label formatting graph bar xvar, over(`varover') title("`varlab2' over `varlab' (`value')") ytitle("`value'") blabel(bar,format(`form')) sleep 2000 drop xvar } ************************************ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
In reply to this post by Stefan.Gawrich
--- On Mon, 16/11/09, [hidden email] wrote:
> the following is a "dirty" but - at least for me - useful > trick: > > I produce a lot of graphs in batch mode. Layout usually > needs a lot of tweaking (titles, labels, formats). <snip> > It would be nice to be able to attach such display > information to the variable, so one could take these > meta-parameters from the dataset instead of specifying > them by hand each time. There seems to be no regular way > to do so. > As a workaround for this, labels for extended missing > values (.a, .b, .c ... .z) came to my mind which can be > set for all numerical variables. I never use any more > than ".a" or ".b" so why not store some information > in the value label of some (by me) unused missing value > like ".l"? You can make your trick less "dirty" by storing this meta-information in what is called in Stata -characteristics-, see : -help char-. This allows you to attach this kind of information to a variable without running the risk that you get weird results when at some future date you do happen to use the ".l" extended missing value. You can extract that information using -char- extended macro function, see: -help extended_fcn- Hope this helps, Maarten -------------------------- Maarten L. Buis Institut fuer Soziologie Universitaet Tuebingen Wilhelmstrasse 36 72074 Tuebingen Germany http://www.maartenbuis.nl -------------------------- * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
In reply to this post by Stefan.Gawrich
[hidden email] wrote: > Hi Statalisters, > > the following is a "dirty" but - at least for me - useful trick: > Several have suggested using characteristics to store this kind of info, and I concur. The main disadvantage of both schemes is that it can be difficult to figure out which variables have which characteristics - I wrote the program below to list variables that either have a certain characteristic defined, or have a certain value for a given characteristic. Ie, dchar, char(graph percent) will list all variables with characteristic 'graph' defined and equal to 'percent'; or dchar, char(graph) will simply list all variables with characteristics `graph' defined. The program also takes a varlist, to limit the search, and stores the hit list in r(varlist). hth, Jeph ***************************************** * DCHAR * program to get list of variables with * certain characteristics *! *! version 1.0 27 Aug 2009 ***************************************** program define dchar, rclass syntax [varlist] , CHar(string) local char1 : word 1 of `char' local char2 : word 2 of `char' local rlist "" local numvars 0 di "" foreach V of varlist `varlist' { local charlist : char `V'[] local found : list char1 in charlist if `found'>0 { if "`char2'"!="" { local charval : char `V'[`char1'] if "`charval'"=="`char2'" { di in y "`V'" _col(40) in g /// "`char1'" _col(50) "`char2'" local rlist "`rlist'`V' " local numvars = `numvars'+1 } } else { local charval : char `V'[`char1'] di in y "`V'" _col(40) in g /// "`char1'" _col(50) "`charval'" local rlist "`rlist'`V' " local numvars = `numvars'+1 } } } return local varlist "`rlist'" return scalar numvars= `numvars' end ***************************************** * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
Note that the official Stata command -ds- already has options -has()-
and -not()- that permit identification of variables with or without named characteristics (but not their contents, as here). There is much scope for different programming styles, but flagging that a variable has a particular property can be done by assigning any non-empty text. Putting informative detail _inside_ characteristics has disadvantages as well as advantages, as the programmer is correspondingly obliged to, or able to, look inside. Nick [hidden email] Jeph Herrin [hidden email] wrote: > Hi Statalisters, > > the following is a "dirty" but - at least for me - useful trick: > Several have suggested using characteristics to store this kind of info, and I concur. The main disadvantage of both schemes is that it can be difficult to figure out which variables have which characteristics - I wrote the program below to list variables that either have a certain characteristic defined, or have a certain value for a given characteristic. Ie, dchar, char(graph percent) will list all variables with characteristic 'graph' defined and equal to 'percent'; or dchar, char(graph) will simply list all variables with characteristics `graph' defined. The program also takes a varlist, to limit the search, and stores the hit list in r(varlist). hth, Jeph ***************************************** * DCHAR * program to get list of variables with * certain characteristics *! *! version 1.0 27 Aug 2009 ***************************************** program define dchar, rclass syntax [varlist] , CHar(string) local char1 : word 1 of `char' local char2 : word 2 of `char' local rlist "" local numvars 0 di "" foreach V of varlist `varlist' { local charlist : char `V'[] local found : list char1 in charlist if `found'>0 { if "`char2'"!="" { local charval : char `V'[`char1'] if "`charval'"=="`char2'" { di in y "`V'" _col(40) in g /// "`char1'" _col(50) "`char2'" local rlist "`rlist'`V' " local numvars = `numvars'+1 } } else { local charval : char `V'[`char1'] di in y "`V'" _col(40) in g /// "`char1'" _col(50) "`charval'" local rlist "`rlist'`V' " local numvars = `numvars'+1 } } } return local varlist "`rlist'" return scalar numvars= `numvars' end ***************************************** * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
Yes, originally I wrote this as a wrapper for -ds-, but found I still had to loop over the return list to check for assigned values so bagged -ds- altogether in the end. That said, obviously, if you don't care what the value of a characteristic is, there is no reason to assign such values. I have datasets that consist of dozens of linked tables containing thousands of variables. The more information I can attach to a variable the first time I look at it, the more I can automate my analyses and output, which in such a situation is highly desirable. cheers, J Nick Cox wrote: > Note that the official Stata command -ds- already has options -has()- > and -not()- that permit identification of variables with or without > named characteristics (but not their contents, as here). > > There is much scope for different programming styles, but flagging that > a variable has a particular property can be done by assigning any > non-empty text. Putting informative detail _inside_ characteristics has > disadvantages as well as advantages, as the programmer is > correspondingly obliged to, or able to, look inside. > > Nick > [hidden email] > > Jeph Herrin > > [hidden email] wrote: >> Hi Statalisters, >> >> the following is a "dirty" but - at least for me - useful trick: >> > > Several have suggested using characteristics to store this kind of > info, and I concur. The main disadvantage of both schemes > is that it can be difficult to figure out which variables have > which characteristics - I wrote the program below to list variables > that either have a certain characteristic defined, or have a > certain value for a given characteristic. Ie, > > dchar, char(graph percent) > > will list all variables with characteristic 'graph' defined > and equal to 'percent'; or > > dchar, char(graph) > > will simply list all variables with characteristics `graph' > defined. The program also takes a varlist, to limit the search, > and stores the hit list in r(varlist). > > hth, > Jeph > > > > ***************************************** > * DCHAR > * program to get list of variables with > * certain characteristics > *! > *! version 1.0 27 Aug 2009 > ***************************************** > > program define dchar, rclass > syntax [varlist] , CHar(string) > local char1 : word 1 of `char' > local char2 : word 2 of `char' > local rlist "" > local numvars 0 > di "" > foreach V of varlist `varlist' { > local charlist : char `V'[] > local found : list char1 in charlist > if `found'>0 { > if "`char2'"!="" { > local charval : char `V'[`char1'] > if "`charval'"=="`char2'" { > di in y "`V'" _col(40) in g /// > "`char1'" _col(50) "`char2'" > local rlist "`rlist'`V' " > local numvars = `numvars'+1 > } > } > else { > local charval : char `V'[`char1'] > di in y "`V'" _col(40) in g /// > "`char1'" _col(50) "`charval'" > local rlist "`rlist'`V' " > local numvars = `numvars'+1 > } > } > } > return local varlist "`rlist'" > return scalar numvars= `numvars' > end > ***************************************** > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
We agree.
Nick [hidden email] Jeph Herrin Yes, originally I wrote this as a wrapper for -ds-, but found I still had to loop over the return list to check for assigned values so bagged -ds- altogether in the end. That said, obviously, if you don't care what the value of a characteristic is, there is no reason to assign such values. I have datasets that consist of dozens of linked tables containing thousands of variables. The more information I can attach to a variable the first time I look at it, the more I can automate my analyses and output, which in such a situation is highly desirable. Nick Cox wrote: > Note that the official Stata command -ds- already has options -has()- > and -not()- that permit identification of variables with or without > named characteristics (but not their contents, as here). > > There is much scope for different programming styles, but flagging that > a variable has a particular property can be done by assigning any > non-empty text. Putting informative detail _inside_ characteristics has > disadvantages as well as advantages, as the programmer is > correspondingly obliged to, or able to, look inside. Jeph Herrin > [hidden email] wrote: >> the following is a "dirty" but - at least for me - useful trick: >> > > Several have suggested using characteristics to store this kind of > info, and I concur. The main disadvantage of both schemes > is that it can be difficult to figure out which variables have > which characteristics - I wrote the program below to list variables > that either have a certain characteristic defined, or have a > certain value for a given characteristic. Ie, > > dchar, char(graph percent) > > will list all variables with characteristic 'graph' defined > and equal to 'percent'; or > > dchar, char(graph) > > will simply list all variables with characteristics `graph' > defined. The program also takes a varlist, to limit the search, > and stores the hit list in r(varlist). > > hth, > Jeph > > > > ***************************************** > * DCHAR > * program to get list of variables with > * certain characteristics > *! > *! version 1.0 27 Aug 2009 > ***************************************** > > program define dchar, rclass > syntax [varlist] , CHar(string) > local char1 : word 1 of `char' > local char2 : word 2 of `char' > local rlist "" > local numvars 0 > di "" > foreach V of varlist `varlist' { > local charlist : char `V'[] > local found : list char1 in charlist > if `found'>0 { > if "`char2'"!="" { > local charval : char `V'[`char1'] > if "`charval'"=="`char2'" { > di in y "`V'" _col(40) in g > "`char1'" _col(50) "`char2'" > local rlist "`rlist'`V' " > local numvars = `numvars'+1 > } > } > else { > local charval : char `V'[`char1'] > di in y "`V'" _col(40) in g /// > "`char1'" _col(50) "`charval'" > local rlist "`rlist'`V' " > local numvars = `numvars'+1 > } > } > } > return local varlist "`rlist'" > return scalar numvars= `numvars' > end > ***************************************** * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
In reply to this post by Stefan.Gawrich
Dear Statalisters,
Is there any univariate table command available that displays a bottom to top or reverse cumulative distribution? One example: I have data on the number of vaccine doses given (0..9) and want to know the percentage of cases having at least 4 or 3 or 2 doses. All I found in Statalist was a thread from 2003 discussing some workarounds. Stefan Gawrich Hesse State Health Office (HLPUG) Dillenburg, Germany * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
<> Have you had a look at -cumul-? One minus its results should be the reverse cumulative distribution. HTH Martin -----UrsprÃ¼ngliche Nachricht----- Von: [hidden email] [mailto:[hidden email]] Im Auftrag von [hidden email] Gesendet: Mittwoch, 20. Januar 2010 10:03 An: [hidden email] Betreff: st: bottom to top or reverse cumulative distribution in table command? Dear Statalisters, Is there any univariate table command available that displays a bottom to top or reverse cumulative distribution? One example: I have data on the number of vaccine doses given (0..9) and want to know the percentage of cases having at least 4 or 3 or 2 doses. All I found in Statalist was a thread from 2003 discussing some workarounds. Stefan Gawrich Hesse State Health Office (HLPUG) Dillenburg, Germany * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
In reply to this post by Stefan.Gawrich
--- Stefan Gawrich wrote:
> Is there any univariate table command available that displays a > bottom to top or reverse cumulative distribution? One example: > I have data on the number of vaccine doses given (0..9) and want > to know the percentage of cases having at least 4 or 3 or 2 doses. I don't know of such a program, but it is easy enough to create it. Below is such a program -reversecum-. It gives for each value the percentage of observations that have that value or more (excluding missing values), it allows for -fweights- and -if- and -in- conditioning. This program will be normally available on your computer like any other Stata command, if you copy the line starting with "*! version..." and ending with "end" (inclusive) into a file and call it reversecum.ado and save it in your personal ado folder (type in Stata -adopath- to find out where that is). Alternatively, you can put these lines at the top of your do-file, and this program will than be available while running that do-file, like in the example below. *-------------- begin example ----------------- program drop _all *! version 1.0.0 MLB 20Jan2010 program reversecum syntax varname [if] [in] [fweight] marksample touse tempvar _freq cum r_cum if "`weight'" != "" { local wgt "[`weight'`exp']" } preserve contract `varlist' if `touse' `wgt', /// freq(`_freq') cpercent(`cum') nomiss qui gen double `r_cum' = 100 - `cum'[_n-1] qui replace `r_cum' = 100 in 1 format `r_cum' %8.2f label var `r_cum' "reverse Cum" tabdisp `varlist', cell(`_freq' `r_cum') restore end sysuse auto, clear reversecum rep78 *-------------------- end example ----------------------- ( For more on how to use examples I sent to statalist see: http://www.maartenbuis.nl/stata/exampleFAQ.html ) Hope this helps, Maarten -------------------------- Maarten L. Buis Institut fuer Soziologie Universitaet Tuebingen Wilhelmstrasse 36 72074 Tuebingen Germany http://www.maartenbuis.nl -------------------------- * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
<> Similarly: ************* sysuse auto, clear cumul rep78, generate(cumrep78) equal gen reversecumrep78=1-cumrep78 bys rep78: egen freq=count(rep78) bys rep78: keep if _n==1 l rep78 freq reversecumrep78 ************* HTH Martin -----UrsprÃ¼ngliche Nachricht----- Von: [hidden email] [mailto:[hidden email]] Im Auftrag von Maarten buis Gesendet: Mittwoch, 20. Januar 2010 10:49 An: [hidden email] Betreff: Re: st: bottom to top or reverse cumulative distribution in table command? --- Stefan Gawrich wrote: > Is there any univariate table command available that displays a > bottom to top or reverse cumulative distribution? One example: > I have data on the number of vaccine doses given (0..9) and want > to know the percentage of cases having at least 4 or 3 or 2 doses. I don't know of such a program, but it is easy enough to create it. Below is such a program -reversecum-. It gives for each value the percentage of observations that have that value or more (excluding missing values), it allows for -fweights- and -if- and -in- conditioning. This program will be normally available on your computer like any other Stata command, if you copy the line starting with "*! version..." and ending with "end" (inclusive) into a file and call it reversecum.ado and save it in your personal ado folder (type in Stata -adopath- to find out where that is). Alternatively, you can put these lines at the top of your do-file, and this program will than be available while running that do-file, like in the example below. *-------------- begin example ----------------- program drop _all *! version 1.0.0 MLB 20Jan2010 program reversecum syntax varname [if] [in] [fweight] marksample touse tempvar _freq cum r_cum if "`weight'" != "" { local wgt "[`weight'`exp']" } preserve contract `varlist' if `touse' `wgt', /// freq(`_freq') cpercent(`cum') nomiss qui gen double `r_cum' = 100 - `cum'[_n-1] qui replace `r_cum' = 100 in 1 format `r_cum' %8.2f label var `r_cum' "reverse Cum" tabdisp `varlist', cell(`_freq' `r_cum') restore end sysuse auto, clear reversecum rep78 *-------------------- end example ----------------------- ( For more on how to use examples I sent to statalist see: http://www.maartenbuis.nl/stata/exampleFAQ.html ) Hope this helps, Maarten -------------------------- Maarten L. Buis Institut fuer Soziologie Universitaet Tuebingen Wilhelmstrasse 36 72074 Tuebingen Germany http://www.maartenbuis.nl -------------------------- * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
Maarten and Martin correctly pointed out that you can re-create such tabulations for yourself from first principles.
However, a more elaborate canned alternative is available. See -groups- from SSC. There was some discussion in SJ-3-4 pr0011 . . . . . . . . Speaking Stata: Problems with tables, Part II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox Q4/03 SJ 3(4):420--439 (no commands) reviews three user-written commands (tabcount, makematrix, and groups) as different approaches to tabulation problems which is available on-line at the Stata Journal website. Nick [hidden email] Martin Weiss Similarly: sysuse auto, clear cumul rep78, generate(cumrep78) equal gen reversecumrep78=1-cumrep78 bys rep78: egen freq=count(rep78) bys rep78: keep if _n==1 l rep78 freq reversecumrep78 Maarten buis --- Stefan Gawrich wrote: > Is there any univariate table command available that displays a > bottom to top or reverse cumulative distribution? One example: > I have data on the number of vaccine doses given (0..9) and want > to know the percentage of cases having at least 4 or 3 or 2 doses. I don't know of such a program, but it is easy enough to create it. Below is such a program -reversecum-. It gives for each value the percentage of observations that have that value or more (excluding missing values), it allows for -fweights- and -if- and -in- conditioning. This program will be normally available on your computer like any other Stata command, if you copy the line starting with "*! version..." and ending with "end" (inclusive) into a file and call it reversecum.ado and save it in your personal ado folder (type in Stata -adopath- to find out where that is). Alternatively, you can put these lines at the top of your do-file, and this program will than be available while running that do-file, like in the example below. *-------------- begin example ----------------- program drop _all *! version 1.0.0 MLB 20Jan2010 program reversecum syntax varname [if] [in] [fweight] marksample touse tempvar _freq cum r_cum if "`weight'" != "" { local wgt "[`weight'`exp']" } preserve contract `varlist' if `touse' `wgt', /// freq(`_freq') cpercent(`cum') nomiss qui gen double `r_cum' = 100 - `cum'[_n-1] qui replace `r_cum' = 100 in 1 format `r_cum' %8.2f label var `r_cum' "reverse Cum" tabdisp `varlist', cell(`_freq' `r_cum') restore end sysuse auto, clear reversecum rep78 *-------------------- end example ----------------------- ( For more on how to use examples I sent to statalist see: http://www.maartenbuis.nl/stata/exampleFAQ.html ) * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
Another elementary trick is to create a negated version of the variable and tabulate in terms of that, removing minus signs from the table via an edit. In its simplest form that requires values to be strictly positive. And any value labels would need to be edited too.
clonevar negx = x replace negx = -negx tab negx Nick [hidden email] Nick Cox Maarten and Martin correctly pointed out that you can re-create such tabulations for yourself from first principles. However, a more elaborate canned alternative is available. See -groups- from SSC. There was some discussion in SJ-3-4 pr0011 . . . . . . . . Speaking Stata: Problems with tables, Part II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox Q4/03 SJ 3(4):420--439 (no commands) reviews three user-written commands (tabcount, makematrix, and groups) as different approaches to tabulation problems which is available on-line at the Stata Journal website. Nick [hidden email] Martin Weiss Similarly: sysuse auto, clear cumul rep78, generate(cumrep78) equal gen reversecumrep78=1-cumrep78 bys rep78: egen freq=count(rep78) bys rep78: keep if _n==1 l rep78 freq reversecumrep78 Maarten buis --- Stefan Gawrich wrote: > Is there any univariate table command available that displays a > bottom to top or reverse cumulative distribution? One example: > I have data on the number of vaccine doses given (0..9) and want > to know the percentage of cases having at least 4 or 3 or 2 doses. I don't know of such a program, but it is easy enough to create it. Below is such a program -reversecum-. It gives for each value the percentage of observations that have that value or more (excluding missing values), it allows for -fweights- and -if- and -in- conditioning. This program will be normally available on your computer like any other Stata command, if you copy the line starting with "*! version..." and ending with "end" (inclusive) into a file and call it reversecum.ado and save it in your personal ado folder (type in Stata -adopath- to find out where that is). Alternatively, you can put these lines at the top of your do-file, and this program will than be available while running that do-file, like in the example below. *-------------- begin example ----------------- program drop _all *! version 1.0.0 MLB 20Jan2010 program reversecum syntax varname [if] [in] [fweight] marksample touse tempvar _freq cum r_cum if "`weight'" != "" { local wgt "[`weight'`exp']" } preserve contract `varlist' if `touse' `wgt', /// freq(`_freq') cpercent(`cum') nomiss qui gen double `r_cum' = 100 - `cum'[_n-1] qui replace `r_cum' = 100 in 1 format `r_cum' %8.2f label var `r_cum' "reverse Cum" tabdisp `varlist', cell(`_freq' `r_cum') restore end sysuse auto, clear reversecum rep78 *-------------------- end example ----------------------- ( For more on how to use examples I sent to statalist see: http://www.maartenbuis.nl/stata/exampleFAQ.html ) * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
Thanks to Martin, Maarten and Nick for your help!
There are many ways to calculate reverse cumulative distributions in a do file and I already have used that. What I was looking for was a quick and easy solution. I often have people on the phone asking for results and it occurred to me that I couldn't produce this kind of data on the fly (calculation in the head is one option but may be error-prone while talking on the phone). - "cumul, equal" works, but requires some additional programming. - "groups, show(rvpercent)" is very flexible but - as far as I see - not exactly what I wanted. Options like RVpercent, rpercent or Rpercent produce reverse cumulative distributions of the type "percentage of cases bigger than x" , not "bigger than or equal to x". It works but for "equal or bigger" one has to pick the result from the previous line. - reversecum works fine. It's a bit slow (compared to the tabulate-command) with big datasets because of the use of preserve, contract and restore, but produces what I was looking for. Thanks Stefan * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
I'll look at -groups- to see if this functionality can be added.
Nick [hidden email] [hidden email] Thanks to Martin, Maarten and Nick for your help! There are many ways to calculate reverse cumulative distributions in a do file and I already have used that. What I was looking for was a quick and easy solution. I often have people on the phone asking for results and it occurred to me that I couldn't produce this kind of data on the fly (calculation in the head is one option but may be error-prone while talking on the phone). - "cumul, equal" works, but requires some additional programming. - "groups, show(rvpercent)" is very flexible but - as far as I see - not exactly what I wanted. Options like RVpercent, rpercent or Rpercent produce reverse cumulative distributions of the type "percentage of cases bigger than x" , not "bigger than or equal to x". It works but for "equal or bigger" one has to pick the result from the previous line. - reversecum works fine. It's a bit slow (compared to the tabulate-command) with big datasets because of the use of preserve, contract and restore, but produces what I was looking for. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
In reply to this post by Stefan.Gawrich
--- On Wed, 20/1/10, [hidden email] wrote:
> - reversecum works fine. It's a bit slow (compared to the > tabulate-command) -tabulate- is a built-in command (i.e. it is part of the executable, not written in ado or mata), so it is unlikely that any user written command can improve on that. -- Maarten -------------------------- Maarten L. Buis Institut fuer Soziologie Universitaet Tuebingen Wilhelmstrasse 36 72074 Tuebingen Germany http://www.maartenbuis.nl -------------------------- * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
In reply to this post by Nick Cox
Thanks to Kit Baum, a revised version of -groups- with an extra -ge- option has been placed on SSC. I believe that option suits Stefan's problem.
Stata 8 is required. For an overview without installation: . ssc type groups.hlp For installation or re-installation, use -ssc- or -adoupdate-. As someone, some day, may want cumulatives to be "less than" rather than "less than or equal to" I will shortly add an -lt- option. Past Fortran (or FORTRAN) users may enjoy a brief moment of nostalgia. Nick [hidden email] Nick Cox I'll look at -groups- to see if this functionality can be added. [hidden email] Thanks to Martin, Maarten and Nick for your help! There are many ways to calculate reverse cumulative distributions in a do file and I already have used that. What I was looking for was a quick and easy solution. I often have people on the phone asking for results and it occurred to me that I couldn't produce this kind of data on the fly (calculation in the head is one option but may be error-prone while talking on the phone). - "cumul, equal" works, but requires some additional programming. - "groups, show(rvpercent)" is very flexible but - as far as I see - not exactly what I wanted. Options like RVpercent, rpercent or Rpercent produce reverse cumulative distributions of the type "percentage of cases bigger than x" , not "bigger than or equal to x". It works but for "equal or bigger" one has to pick the result from the previous line. - reversecum works fine. It's a bit slow (compared to the tabulate-command) with big datasets because of the use of preserve, contract and restore, but produces what I was looking for. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
In reply to this post by Maarten buis
Thanks Nick for the "ge" addition to 'groups', works great. Maarten Buis wrote: -tabulate- is a built-in command (i.e. it is part of the executable, not written in ado or mata), so it is unlikely that any user written command can improve on that. That's right and I didn't expect that. What I meant to write is that calculations can take significant time in big datasets. Performance example (Dual Core PC, 2 Gig Ram, Stata 11 MP): 1. One variable in a dataset set rmsg on set obs 2000000 gen x = int(uniform() * 10) tab x // 0.44 sec. groups x,show(freq rpercent) ge // 12.8 sec. reversecum x // 5.0 sec. 2. 20 variables in a dataset set obs 2000000 forval x = 1/20 { gen x`x' = int(uniform() * 10) } tab x1 // 0,44 sec. groups x1,show(freq rpercent) ge // 14.2 sec. reversecum x1 // 29.2 sec. BTW: The first time (0.44 sec.) reminds me of my first Stata experience a decade ago. I worked with SPSS at university before and couldn't believe how fast Stata was. (instant tabulate results...wow) Thanks again Stefan * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
In reply to this post by Nick Cox
Thanks again to Kit, a version with the extra -lt- option is now on SSC.
Please install or re-install using -ssc- or -adoupdate- if interested. The nub of the matter is best shown by an example: . sysuse auto . groups rep78, show(F f Rf) lt +---------------------------+ | rep78 # < Freq. # > | |---------------------------| | 1 0 2 67 | | 2 2 8 59 | | 3 10 30 29 | | 4 40 18 11 | | 5 58 11 0 | +---------------------------+ . groups rep78, show(F f Rf) ge +-----------------------------+ | rep78 # <= Freq. # >= | |-----------------------------| | 1 2 2 69 | | 2 10 8 67 | | 3 40 30 59 | | 4 58 18 29 | | 5 69 11 11 | +-----------------------------+ Nick [hidden email] Nick Cox Thanks to Kit Baum, a revised version of -groups- with an extra -ge- option has been placed on SSC. I believe that option suits Stefan's problem. Stata 8 is required. For an overview without installation: . ssc type groups.hlp For installation or re-installation, use -ssc- or -adoupdate-. As someone, some day, may want cumulatives to be "less than" rather than "less than or equal to" I will shortly add an -lt- option. Past Fortran (or FORTRAN) users may enjoy a brief moment of nostalgia. Nick [hidden email] Nick Cox I'll look at -groups- to see if this functionality can be added. [hidden email] Thanks to Martin, Maarten and Nick for your help! There are many ways to calculate reverse cumulative distributions in a do file and I already have used that. What I was looking for was a quick and easy solution. I often have people on the phone asking for results and it occurred to me that I couldn't produce this kind of data on the fly (calculation in the head is one option but may be error-prone while talking on the phone). - "cumul, equal" works, but requires some additional programming. - "groups, show(rvpercent)" is very flexible but - as far as I see - not exactly what I wanted. Options like RVpercent, rpercent or Rpercent produce reverse cumulative distributions of the type "percentage of cases bigger than x" , not "bigger than or equal to x". It works but for "equal or bigger" one has to pick the result from the previous line. - reversecum works fine. It's a bit slow (compared to the tabulate-command) with big datasets because of the use of preserve, contract and restore, but produces what I was looking for. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
In reply to this post by Stefan.Gawrich
Marcello Pagano wrote:
New subscribers will have to go through a screening process that will add time, hopefully less than a day, before they can subscribe. The screening process is designed to keep some, hopefully very few, people out. I'm a frequent reader of the Statalist archive but an unfrequent writer. I can't handle all this mails on my work account so I only subscribe temporarily to ask or contribute something. I don't want to cause any trouble or work with this behaviour. It would be nice to have quick subscription for people already known to majordomo if possible. Stefan Gawrich Dillenburg Germany * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ |
Free forum by Nabble | Edit this page |