st: AW: Graph: Colouring table cells based on conditions or data distribution

classic Classic list List threaded Threaded
103 messages Options
123456
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: st: "dirty trick": metadata in extended missing value labels

Nick Winter
This seems like the sort of thing that characteristics are designed for
-- see -help char-

- Nick Winter


[hidden email] wrote:

> Hi Statalisters,
>
> the following is a "dirty" but - at least for me - useful trick:
>
> I produce a lot of graphs in batch mode. Layout usually needs a lot of
> tweaking (titles, labels, formats).
> So I had to write many "foreach" loops with 6, 7, 8 or more parallel
> lists to specify individual layout parameters.
>
> One example: Some metric var (length) (not really metric but integer in
> auto.dta) and a 0/1-var (foreign):
>
> clear
> sysuse auto
> foreach x of var length foreign {
> graph bar `x',over(rep78) blabel(bar)
> sleep 2000
> }
>
> The 0/1-mean is better displayed as a proportion. To make it look good,
> I would multiply foreign by 100, label the axis "Percentage", write
> "Percentage" into the title and set the label format to %3.1f.
>
> It would be nice to be able to attach such display information to the
> variable, so one could take these meta-parameters from the dataset
> instead of specifying them by hand each time. There seems to be no
> regular way to do so.
> As a workaround for this, labels for extended missing values (.a, .b, .c
> ... .z) came to my mind which can be set for all numerical variables. I
> never use any more than ".a" or ".b" so why not store some information
> in the value label of some (by me) unused missing value like ".l"?
>
> The following code stores some basic information on type of display and
> label format to value label ".l" of variables. Later two graphs are
> produced using this information.
>
>
> *************************************
> *** Create example dataset
> clear
> sysuse auto
> *** Create metadata codes
> foreach var of varlist _all {
> // only for numerical vars
> local type : type `var'
> if inlist("`type'", "byte", "int", "float", "real", "double") == 0
> continue
> local form "m21" // default : display as mean, label format 2.1
>
> *** Example 1: m: display as mean, label format 3.0
> if inlist("`var'", "length") == 1 local form "m30"
>
> *** Example 2: p: display as percentage, format 3.1
> if inlist("`var'", "foreign") == 1 local form "p31"
>
> *** Each var gets a new value label (templbl`var'). Existing value
> labels are copied.
> local lbl`var' : value label `var'
> if "`lbl`var''" == "" {
> cap label drop templbl`var'
> label define templbl`var' .l "`form'"
> label values `var' templbl`var'
> }
> else {
> cap label drop templbl`var'
> label copy `lbl`var'' templbl`var'  
> label define templbl`var' .l "`form'" , add
> label values `var' templbl`var'
> }
> }
>
> *** Now set up the graphs:
> local varover "rep78"
> local varlab : variable label `varover'
> foreach var of varlist length foreign {
> local varlab2 : variable label `var'
> local how :  label  templbl`var' .l // ".l" label content into a local
> gen xvar = `var'  // in order not to alter the original var, xvar is
> used in the graph
> if substr("`how'",1,1) == "p" replace xvar = `var' * 100 // multiply
> with 100 if var displays percentage
> if substr("`how'",1,1) == "m" local value = "Mean" // Label for titles
> if substr("`how'",1,1) == "p" local value = "Percentage" // Label for
> titles
> local form = "%" + substr("`how'",2,1) + "." + substr("`how'",3,1) + "f"
> // local for label formatting
> graph bar xvar, over(`varover') title("`varlab2' over `varlab'
> (`value')") ytitle("`value'") blabel(bar,format(`form'))
> sleep 2000
> drop xvar
> }
> ************************************
>
>
> Stefan
>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

--
--------------------------------------------------------------
Nicholas Winter                                 434.924.6994 t
Assistant Professor                             434.924.3359 f
Department of Politics                  [hidden email] e
University of Virginia          faculty.virginia.edu/nwinter w
PO Box 400787, 100 Cabell Hall
Charlottesville, VA 22904

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

st: RE: "dirty trick": metadata in extended missing value labels

Nick Cox
In reply to this post by Stefan.Gawrich
The Stata philosophy here, as indeed often elsewhere, is that it
provides low-level tools so that you can use them as you wish for a
variety of higher level purposes.

In particular, Stata clearly has no concept of a variable that should be
displayed in percent terms. That's entirely a user preference.

To me, the natural low-level tool here for recording such a preference
is that of characteristics. What you did worked for you, but I'd rather
reserve missing value support for missing values. Positively, using
characteristics is an example of attaching information to the variable,
exactly as you wish.

You could define characteristics like this

char foreign[showpc] "percent"

and then in a loop condition on such a characteristic being found

e.g.

foreach v of var <varlist> {
        if "`: char `v'[showpc]'" == "percent" {
                <whatever>
        }
        else {
                <whatever else>
        }
}

Note that it is not an error to refer to a non-existent characteristic.
It is just treated as if it were an empty string. So, I don't need to
define this characteristic when I don't need it.

Nick
[hidden email]

[hidden email]

the following is a "dirty" but - at least for me - useful trick:

I produce a lot of graphs in batch mode. Layout usually needs a lot of
tweaking (titles, labels, formats).
So I had to write many "foreach" loops with 6, 7, 8 or more parallel
lists to specify individual layout parameters.

One example: Some metric var (length) (not really metric but integer in
auto.dta) and a 0/1-var (foreign):

clear
sysuse auto
foreach x of var length foreign {
graph bar `x',over(rep78) blabel(bar)
sleep 2000
}

The 0/1-mean is better displayed as a proportion. To make it look good,
I would multiply foreign by 100, label the axis "Percentage", write
"Percentage" into the title and set the label format to %3.1f.

It would be nice to be able to attach such display information to the
variable, so one could take these meta-parameters from the dataset
instead of specifying them by hand each time. There seems to be no
regular way to do so.
As a workaround for this, labels for extended missing values (.a, .b, .c
... .z) came to my mind which can be set for all numerical variables. I
never use any more than ".a" or ".b" so why not store some information
in the value label of some (by me) unused missing value like ".l"?

The following code stores some basic information on type of display and
label format to value label ".l" of variables. Later two graphs are
produced using this information.


*************************************
*** Create example dataset
clear
sysuse auto
*** Create metadata codes
foreach var of varlist _all {
// only for numerical vars
local type : type `var'
if inlist("`type'", "byte", "int", "float", "real", "double") == 0
continue
local form "m21" // default : display as mean, label format 2.1

*** Example 1: m: display as mean, label format 3.0
if inlist("`var'", "length") == 1 local form "m30"

*** Example 2: p: display as percentage, format 3.1
if inlist("`var'", "foreign") == 1 local form "p31"

*** Each var gets a new value label (templbl`var'). Existing value
labels are copied.
local lbl`var' : value label `var'
if "`lbl`var''" == "" {
cap label drop templbl`var'
label define templbl`var' .l "`form'"
label values `var' templbl`var'
}
else {
cap label drop templbl`var'
label copy `lbl`var'' templbl`var'  
label define templbl`var' .l "`form'" , add
label values `var' templbl`var'
}
}

*** Now set up the graphs:
local varover "rep78"
local varlab : variable label `varover'
foreach var of varlist length foreign {
local varlab2 : variable label `var'
local how :  label  templbl`var' .l // ".l" label content into a local
gen xvar = `var'  // in order not to alter the original var, xvar is
used in the graph
if substr("`how'",1,1) == "p" replace xvar = `var' * 100 // multiply
with 100 if var displays percentage
if substr("`how'",1,1) == "m" local value = "Mean" // Label for titles
if substr("`how'",1,1) == "p" local value = "Percentage" // Label for
titles
local form = "%" + substr("`how'",2,1) + "." + substr("`how'",3,1) + "f"
// local for label formatting
graph bar xvar, over(`varover') title("`varlab2' over `varlab'
(`value')") ytitle("`value'") blabel(bar,format(`form'))
sleep 2000
drop xvar
}
************************************

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: st: "dirty trick": metadata in extended missing value labels

Maarten buis
In reply to this post by Stefan.Gawrich
--- On Mon, 16/11/09, [hidden email] wrote:
> the following is a "dirty" but - at least for me - useful
> trick:
>
> I produce a lot of graphs in batch mode. Layout usually
> needs a lot of tweaking (titles, labels, formats).
<snip>

> It would be nice to be able to attach such display
> information to the variable, so one could take these
> meta-parameters from the dataset instead of specifying
> them by hand each time. There seems to be no regular way
> to do so.
> As a workaround for this, labels for extended missing
> values (.a, .b, .c ... .z) came to my mind which can be
> set for all numerical variables. I never use any more
> than ".a" or ".b" so why not store some information
> in the value label of some (by me) unused missing value
> like ".l"?
<snip>

You can make your trick less "dirty" by storing this
meta-information in what is called in Stata
-characteristics-, see : -help char-. This allows
you to attach this kind of information to a variable
without running the risk that you get weird results
when at some future date you do happen to use the ".l"
extended missing value. You can extract that
information using -char- extended macro function, see:
-help extended_fcn-

Hope this helps,
Maarten

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------





     

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: st: "dirty trick": metadata in extended missing value labels

Jeph Herrin
In reply to this post by Stefan.Gawrich


[hidden email] wrote:
> Hi Statalisters,
>
> the following is a "dirty" but - at least for me - useful trick:
>


Several have suggested using characteristics to store this kind of
info, and I concur. The main disadvantage of both schemes
is that it can be difficult to figure out which variables have
which characteristics - I wrote the program below to list variables
that either have a certain characteristic defined, or have a
certain value for a given characteristic. Ie,

  dchar, char(graph percent)

will list all variables with characteristic 'graph' defined
and equal to 'percent'; or

  dchar, char(graph)

will simply list all variables with characteristics `graph'
defined. The program also takes a varlist, to limit the search,
and stores the hit list in r(varlist).

hth,
Jeph



*****************************************
*   DCHAR
*   program to get list of variables with
*   certain characteristics
*!
*!  version 1.0       27 Aug 2009
*****************************************

program define dchar, rclass
         syntax [varlist] , CHar(string)
         local char1 : word 1 of `char'
         local char2 : word 2 of `char'
         local rlist ""
         local numvars 0
         di ""
         foreach V of varlist `varlist' {
                 local charlist : char `V'[]
                 local found : list char1 in charlist
                 if `found'>0 {
                         if "`char2'"!="" {
                                 local charval : char `V'[`char1']
                                 if "`charval'"=="`char2'" {
                                         di in y "`V'" _col(40) in g ///
                                          "`char1'"  _col(50) "`char2'"
                                         local rlist "`rlist'`V' "
                                         local numvars = `numvars'+1
                                 }
                         }
                         else {
                                 local charval : char `V'[`char1']
                                 di in y "`V'" _col(40) in g ///
                                   "`char1'" _col(50) "`charval'"
                                 local rlist "`rlist'`V' "
                                 local numvars = `numvars'+1
                         }
                 }
         }
         return local varlist "`rlist'"
         return scalar numvars= `numvars'
end
*****************************************
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: st: "dirty trick": metadata in extended missing value labels

Nick Cox
Note that the official Stata command -ds- already has options -has()-
and -not()- that permit identification of variables with or without
named characteristics (but not their contents, as here).

There is much scope for different programming styles, but flagging that
a variable has a particular property can be done by assigning any
non-empty text. Putting informative detail _inside_ characteristics has
disadvantages as well as advantages, as the programmer is
correspondingly obliged to, or able to, look inside.

Nick
[hidden email]

Jeph Herrin

[hidden email] wrote:
> Hi Statalisters,
>
> the following is a "dirty" but - at least for me - useful trick:
>

Several have suggested using characteristics to store this kind of
info, and I concur. The main disadvantage of both schemes
is that it can be difficult to figure out which variables have
which characteristics - I wrote the program below to list variables
that either have a certain characteristic defined, or have a
certain value for a given characteristic. Ie,

  dchar, char(graph percent)

will list all variables with characteristic 'graph' defined
and equal to 'percent'; or

  dchar, char(graph)

will simply list all variables with characteristics `graph'
defined. The program also takes a varlist, to limit the search,
and stores the hit list in r(varlist).

hth,
Jeph



*****************************************
*   DCHAR
*   program to get list of variables with
*   certain characteristics
*!
*!  version 1.0       27 Aug 2009
*****************************************

program define dchar, rclass
         syntax [varlist] , CHar(string)
         local char1 : word 1 of `char'
         local char2 : word 2 of `char'
         local rlist ""
         local numvars 0
         di ""
         foreach V of varlist `varlist' {
                 local charlist : char `V'[]
                 local found : list char1 in charlist
                 if `found'>0 {
                         if "`char2'"!="" {
                                 local charval : char `V'[`char1']
                                 if "`charval'"=="`char2'" {
                                         di in y "`V'" _col(40) in g ///
                                          "`char1'"  _col(50) "`char2'"
                                         local rlist "`rlist'`V' "
                                         local numvars = `numvars'+1
                                 }
                         }
                         else {
                                 local charval : char `V'[`char1']
                                 di in y "`V'" _col(40) in g ///
                                   "`char1'" _col(50) "`charval'"
                                 local rlist "`rlist'`V' "
                                 local numvars = `numvars'+1
                         }
                 }
         }
         return local varlist "`rlist'"
         return scalar numvars= `numvars'
end
*****************************************
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: st: "dirty trick": metadata in extended missing value labels

Jeph Herrin

Yes, originally I wrote this as a wrapper for -ds-, but found I
still had to loop over the return list to check for assigned values
so bagged -ds- altogether in the end.

That said, obviously, if you don't care what the value of a
characteristic is, there is no reason to assign such values.
I have datasets that consist of dozens of linked tables containing
thousands of variables. The more information I can attach to a
variable the first time I look at it, the more I can automate
my analyses and output, which in such a situation is highly
desirable.

cheers,
J



Nick Cox wrote:

> Note that the official Stata command -ds- already has options -has()-
> and -not()- that permit identification of variables with or without
> named characteristics (but not their contents, as here).
>
> There is much scope for different programming styles, but flagging that
> a variable has a particular property can be done by assigning any
> non-empty text. Putting informative detail _inside_ characteristics has
> disadvantages as well as advantages, as the programmer is
> correspondingly obliged to, or able to, look inside.
>
> Nick
> [hidden email]
>
> Jeph Herrin
>
> [hidden email] wrote:
>> Hi Statalisters,
>>
>> the following is a "dirty" but - at least for me - useful trick:
>>
>
> Several have suggested using characteristics to store this kind of
> info, and I concur. The main disadvantage of both schemes
> is that it can be difficult to figure out which variables have
> which characteristics - I wrote the program below to list variables
> that either have a certain characteristic defined, or have a
> certain value for a given characteristic. Ie,
>
>   dchar, char(graph percent)
>
> will list all variables with characteristic 'graph' defined
> and equal to 'percent'; or
>
>   dchar, char(graph)
>
> will simply list all variables with characteristics `graph'
> defined. The program also takes a varlist, to limit the search,
> and stores the hit list in r(varlist).
>
> hth,
> Jeph
>
>
>
> *****************************************
> *   DCHAR
> *   program to get list of variables with
> *   certain characteristics
> *!
> *!  version 1.0       27 Aug 2009
> *****************************************
>
> program define dchar, rclass
>          syntax [varlist] , CHar(string)
>          local char1 : word 1 of `char'
>          local char2 : word 2 of `char'
>          local rlist ""
>          local numvars 0
>          di ""
>          foreach V of varlist `varlist' {
>                  local charlist : char `V'[]
>                  local found : list char1 in charlist
>                  if `found'>0 {
>                          if "`char2'"!="" {
>                                  local charval : char `V'[`char1']
>                                  if "`charval'"=="`char2'" {
>                                          di in y "`V'" _col(40) in g ///
>          "`char1'"  _col(50) "`char2'"
>                                          local rlist "`rlist'`V' "
>                                          local numvars = `numvars'+1
>                                  }
>                          }
>                          else {
>                                  local charval : char `V'[`char1']
>                                  di in y "`V'" _col(40) in g ///
>   "`char1'" _col(50) "`charval'"
>                                  local rlist "`rlist'`V' "
>                                  local numvars = `numvars'+1
>                          }
>                  }
>          }
>          return local varlist "`rlist'"
>          return scalar numvars= `numvars'
> end
> *****************************************
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: st: "dirty trick": metadata in extended missing value labels

Nick Cox
We agree.

Nick
[hidden email]

Jeph Herrin

Yes, originally I wrote this as a wrapper for -ds-, but found I
still had to loop over the return list to check for assigned values
so bagged -ds- altogether in the end.

That said, obviously, if you don't care what the value of a
characteristic is, there is no reason to assign such values.
I have datasets that consist of dozens of linked tables containing
thousands of variables. The more information I can attach to a
variable the first time I look at it, the more I can automate
my analyses and output, which in such a situation is highly
desirable.

Nick Cox wrote:

> Note that the official Stata command -ds- already has options -has()-
> and -not()- that permit identification of variables with or without
> named characteristics (but not their contents, as here).
>
> There is much scope for different programming styles, but flagging
that
> a variable has a particular property can be done by assigning any
> non-empty text. Putting informative detail _inside_ characteristics
has
> disadvantages as well as advantages, as the programmer is
> correspondingly obliged to, or able to, look inside.

Jeph Herrin
 
> [hidden email] wrote:

>> the following is a "dirty" but - at least for me - useful trick:
>>
>
> Several have suggested using characteristics to store this kind of
> info, and I concur. The main disadvantage of both schemes
> is that it can be difficult to figure out which variables have
> which characteristics - I wrote the program below to list variables
> that either have a certain characteristic defined, or have a
> certain value for a given characteristic. Ie,
>
>   dchar, char(graph percent)
>
> will list all variables with characteristic 'graph' defined
> and equal to 'percent'; or
>
>   dchar, char(graph)
>
> will simply list all variables with characteristics `graph'
> defined. The program also takes a varlist, to limit the search,
> and stores the hit list in r(varlist).
>
> hth,
> Jeph
>
>
>
> *****************************************
> *   DCHAR
> *   program to get list of variables with
> *   certain characteristics
> *!
> *!  version 1.0       27 Aug 2009
> *****************************************
>
> program define dchar, rclass
>          syntax [varlist] , CHar(string)
>          local char1 : word 1 of `char'
>          local char2 : word 2 of `char'
>          local rlist ""
>          local numvars 0
>          di ""
>          foreach V of varlist `varlist' {
>                  local charlist : char `V'[]
>                  local found : list char1 in charlist
>                  if `found'>0 {
>                          if "`char2'"!="" {
>                                  local charval : char `V'[`char1']
>                                  if "`charval'"=="`char2'" {
>                                          di in y "`V'" _col(40) in g
///

>          "`char1'"  _col(50) "`char2'"
>                                          local rlist "`rlist'`V' "
>                                          local numvars = `numvars'+1
>                                  }
>                          }
>                          else {
>                                  local charval : char `V'[`char1']
>                                  di in y "`V'" _col(40) in g ///
>   "`char1'" _col(50) "`charval'"
>                                  local rlist "`rlist'`V' "
>                                  local numvars = `numvars'+1
>                          }
>                  }
>          }
>          return local varlist "`rlist'"
>          return scalar numvars= `numvars'
> end
> *****************************************

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

st: bottom to top or reverse cumulative distribution in table command?

Stefan.Gawrich
In reply to this post by Stefan.Gawrich
Dear Statalisters,

Is there any univariate table command available that displays a bottom to top or reverse cumulative distribution?
One example: I have data on the number of vaccine doses given (0..9) and want to know the percentage of cases having at least 4 or 3 or 2 doses.

All I found in Statalist was a thread from 2003 discussing some workarounds.


Stefan Gawrich
Hesse State Health Office (HLPUG)
Dillenburg, Germany






*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

st: AW: bottom to top or reverse cumulative distribution in table command?

Martin Weiss-5

<>


Have you had a look at -cumul-? One minus its results should be the reverse cumulative distribution.



HTH
Martin

-----Urspr√ľngliche Nachricht-----
Von: [hidden email] [mailto:[hidden email]] Im Auftrag von [hidden email]
Gesendet: Mittwoch, 20. Januar 2010 10:03
An: [hidden email]
Betreff: st: bottom to top or reverse cumulative distribution in table command?

Dear Statalisters,

Is there any univariate table command available that displays a bottom to top or reverse cumulative distribution?
One example: I have data on the number of vaccine doses given (0..9) and want to know the percentage of cases having at least 4 or 3 or 2 doses.

All I found in Statalist was a thread from 2003 discussing some workarounds.


Stefan Gawrich
Hesse State Health Office (HLPUG)
Dillenburg, Germany






*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: st: bottom to top or reverse cumulative distribution in table command?

Maarten buis
In reply to this post by Stefan.Gawrich
--- Stefan Gawrich wrote:
> Is there any univariate table command available that displays a
> bottom to top or reverse cumulative distribution? One example:
> I have data on the number of vaccine doses given (0..9) and want
> to know the percentage of cases having at least 4 or 3 or 2 doses.

I don't know of such a program, but it is easy enough to create it.
Below is such a program -reversecum-. It gives for each value the
percentage of observations that have that value or more (excluding
missing values), it allows for -fweights- and -if- and -in-
conditioning.

This program will be normally available on your computer like any
other Stata command, if you copy the line starting with
"*! version..." and ending with "end" (inclusive) into a file
and call it reversecum.ado and save it in your personal ado folder
(type in Stata -adopath- to find out where that is).

Alternatively, you can put these lines at the top of your do-file,
and this program will than be available while running that do-file,
like in the example below.

*-------------- begin example -----------------
program drop _all
*! version 1.0.0 MLB 20Jan2010
program reversecum
        syntax varname [if] [in] [fweight]
        marksample touse
        tempvar _freq cum r_cum
        if "`weight'" != "" {
                local wgt "[`weight'`exp']"
        }
        preserve
        contract `varlist' if `touse' `wgt', ///
                 freq(`_freq') cpercent(`cum') nomiss
        qui gen double `r_cum' = 100 - `cum'[_n-1]
        qui replace `r_cum' = 100 in 1
        format `r_cum' %8.2f
        label var `r_cum' "reverse Cum"
        tabdisp `varlist', cell(`_freq' `r_cum')
        restore
end

sysuse auto, clear
reversecum rep78
*-------------------- end example -----------------------
( For more on how to use examples I sent to statalist see:
 http://www.maartenbuis.nl/stata/exampleFAQ.html )

Hope this helps,
Maarten

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------


     

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

AW: st: bottom to top or reverse cumulative distribution in table command?

Martin Weiss-5

<>

Similarly:


*************
sysuse auto, clear
cumul rep78, generate(cumrep78) equal
gen reversecumrep78=1-cumrep78
bys rep78: egen freq=count(rep78)
bys rep78: keep if _n==1
l rep78 freq reversecumrep78
*************



HTH
Martin


-----Urspr√ľngliche Nachricht-----
Von: [hidden email] [mailto:[hidden email]] Im Auftrag von Maarten buis
Gesendet: Mittwoch, 20. Januar 2010 10:49
An: [hidden email]
Betreff: Re: st: bottom to top or reverse cumulative distribution in table command?

--- Stefan Gawrich wrote:
> Is there any univariate table command available that displays a
> bottom to top or reverse cumulative distribution? One example:
> I have data on the number of vaccine doses given (0..9) and want
> to know the percentage of cases having at least 4 or 3 or 2 doses.

I don't know of such a program, but it is easy enough to create it.
Below is such a program -reversecum-. It gives for each value the
percentage of observations that have that value or more (excluding
missing values), it allows for -fweights- and -if- and -in-
conditioning.

This program will be normally available on your computer like any
other Stata command, if you copy the line starting with
"*! version..." and ending with "end" (inclusive) into a file
and call it reversecum.ado and save it in your personal ado folder
(type in Stata -adopath- to find out where that is).

Alternatively, you can put these lines at the top of your do-file,
and this program will than be available while running that do-file,
like in the example below.

*-------------- begin example -----------------
program drop _all
*! version 1.0.0 MLB 20Jan2010
program reversecum
        syntax varname [if] [in] [fweight]
        marksample touse
        tempvar _freq cum r_cum
        if "`weight'" != "" {
                local wgt "[`weight'`exp']"
        }
        preserve
        contract `varlist' if `touse' `wgt', ///
                 freq(`_freq') cpercent(`cum') nomiss
        qui gen double `r_cum' = 100 - `cum'[_n-1]
        qui replace `r_cum' = 100 in 1
        format `r_cum' %8.2f
        label var `r_cum' "reverse Cum"
        tabdisp `varlist', cell(`_freq' `r_cum')
        restore
end

sysuse auto, clear
reversecum rep78
*-------------------- end example -----------------------
( For more on how to use examples I sent to statalist see:
 http://www.maartenbuis.nl/stata/exampleFAQ.html )

Hope this helps,
Maarten

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------


     

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: st: bottom to top or reverse cumulative distribution in table command?

Nick Cox
Maarten and Martin correctly pointed out that you can re-create such tabulations for yourself from first principles.

However, a more elaborate canned alternative is available. See -groups- from SSC. There was some discussion in

SJ-3-4  pr0011  . . . . . . . .  Speaking Stata: Problems with tables, Part II
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
        Q4/03   SJ 3(4):420--439                                 (no commands)
        reviews three user-written commands (tabcount, makematrix,
        and groups) as different approaches to tabulation problems


which is available on-line at the Stata Journal website.

Nick
[hidden email]

Martin Weiss

Similarly:

sysuse auto, clear
cumul rep78, generate(cumrep78) equal
gen reversecumrep78=1-cumrep78
bys rep78: egen freq=count(rep78)
bys rep78: keep if _n==1
l rep78 freq reversecumrep78

Maarten buis

--- Stefan Gawrich wrote:
> Is there any univariate table command available that displays a
> bottom to top or reverse cumulative distribution? One example:
> I have data on the number of vaccine doses given (0..9) and want
> to know the percentage of cases having at least 4 or 3 or 2 doses.

I don't know of such a program, but it is easy enough to create it.
Below is such a program -reversecum-. It gives for each value the
percentage of observations that have that value or more (excluding
missing values), it allows for -fweights- and -if- and -in-
conditioning.

This program will be normally available on your computer like any
other Stata command, if you copy the line starting with
"*! version..." and ending with "end" (inclusive) into a file
and call it reversecum.ado and save it in your personal ado folder
(type in Stata -adopath- to find out where that is).

Alternatively, you can put these lines at the top of your do-file,
and this program will than be available while running that do-file,
like in the example below.

*-------------- begin example -----------------
program drop _all
*! version 1.0.0 MLB 20Jan2010
program reversecum
        syntax varname [if] [in] [fweight]
        marksample touse
        tempvar _freq cum r_cum
        if "`weight'" != "" {
                local wgt "[`weight'`exp']"
        }
        preserve
        contract `varlist' if `touse' `wgt', ///
                 freq(`_freq') cpercent(`cum') nomiss
        qui gen double `r_cum' = 100 - `cum'[_n-1]
        qui replace `r_cum' = 100 in 1
        format `r_cum' %8.2f
        label var `r_cum' "reverse Cum"
        tabdisp `varlist', cell(`_freq' `r_cum')
        restore
end

sysuse auto, clear
reversecum rep78
*-------------------- end example -----------------------
( For more on how to use examples I sent to statalist see:
 http://www.maartenbuis.nl/stata/exampleFAQ.html )

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: st: bottom to top or reverse cumulative distribution in table command?

Nick Cox
Another elementary trick is to create a negated version of the variable and tabulate in terms of that, removing minus signs from the table via an edit. In its simplest form that requires values to be strictly positive. And any value labels would need to be edited too.

clonevar negx = x
replace negx = -negx
tab negx

Nick
[hidden email]

Nick Cox

Maarten and Martin correctly pointed out that you can re-create such tabulations for yourself from first principles.

However, a more elaborate canned alternative is available. See -groups- from SSC. There was some discussion in

SJ-3-4  pr0011  . . . . . . . .  Speaking Stata: Problems with tables, Part II
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
        Q4/03   SJ 3(4):420--439                                 (no commands)
        reviews three user-written commands (tabcount, makematrix,
        and groups) as different approaches to tabulation problems


which is available on-line at the Stata Journal website.

Nick
[hidden email]

Martin Weiss

Similarly:

sysuse auto, clear
cumul rep78, generate(cumrep78) equal
gen reversecumrep78=1-cumrep78
bys rep78: egen freq=count(rep78)
bys rep78: keep if _n==1
l rep78 freq reversecumrep78

Maarten buis

--- Stefan Gawrich wrote:
> Is there any univariate table command available that displays a
> bottom to top or reverse cumulative distribution? One example:
> I have data on the number of vaccine doses given (0..9) and want
> to know the percentage of cases having at least 4 or 3 or 2 doses.

I don't know of such a program, but it is easy enough to create it.
Below is such a program -reversecum-. It gives for each value the
percentage of observations that have that value or more (excluding
missing values), it allows for -fweights- and -if- and -in-
conditioning.

This program will be normally available on your computer like any
other Stata command, if you copy the line starting with
"*! version..." and ending with "end" (inclusive) into a file
and call it reversecum.ado and save it in your personal ado folder
(type in Stata -adopath- to find out where that is).

Alternatively, you can put these lines at the top of your do-file,
and this program will than be available while running that do-file,
like in the example below.

*-------------- begin example -----------------
program drop _all
*! version 1.0.0 MLB 20Jan2010
program reversecum
        syntax varname [if] [in] [fweight]
        marksample touse
        tempvar _freq cum r_cum
        if "`weight'" != "" {
                local wgt "[`weight'`exp']"
        }
        preserve
        contract `varlist' if `touse' `wgt', ///
                 freq(`_freq') cpercent(`cum') nomiss
        qui gen double `r_cum' = 100 - `cum'[_n-1]
        qui replace `r_cum' = 100 in 1
        format `r_cum' %8.2f
        label var `r_cum' "reverse Cum"
        tabdisp `varlist', cell(`_freq' `r_cum')
        restore
end

sysuse auto, clear
reversecum rep78
*-------------------- end example -----------------------
( For more on how to use examples I sent to statalist see:
 http://www.maartenbuis.nl/stata/exampleFAQ.html )


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

AW: st: bottom to top or reverse cumulative distribution in table command?

Stefan.Gawrich
Thanks to Martin, Maarten and Nick for your help!

There are many ways to calculate reverse cumulative distributions in a do file and I already have used that. What I was looking for was a quick and easy solution. I often have people on the phone asking for results and it occurred to me that I couldn't produce this kind of data on the fly (calculation in the head is one option but may be error-prone while talking on the phone).


- "cumul, equal" works, but requires some additional programming.

- "groups, show(rvpercent)" is very flexible but - as far as I see  - not exactly what I wanted. Options like RVpercent, rpercent or Rpercent produce reverse cumulative distributions of the type "percentage of cases bigger than x" , not "bigger than or equal to x". It works but for "equal or bigger" one has to pick the result from the previous line.

- reversecum works fine. It's a bit slow (compared to the tabulate-command) with big datasets because of the use of preserve, contract and restore, but produces what I was looking for.


Thanks

Stefan





*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: st: bottom to top or reverse cumulative distribution in table command?

Nick Cox
I'll look at -groups- to see if this functionality can be added.

Nick
[hidden email]

[hidden email]

Thanks to Martin, Maarten and Nick for your help!

There are many ways to calculate reverse cumulative distributions in a do file and I already have used that. What I was looking for was a quick and easy solution. I often have people on the phone asking for results and it occurred to me that I couldn't produce this kind of data on the fly (calculation in the head is one option but may be error-prone while talking on the phone).

- "cumul, equal" works, but requires some additional programming.

- "groups, show(rvpercent)" is very flexible but - as far as I see  - not exactly what I wanted. Options like RVpercent, rpercent or Rpercent produce reverse cumulative distributions of the type "percentage of cases bigger than x" , not "bigger than or equal to x". It works but for "equal or bigger" one has to pick the result from the previous line.

- reversecum works fine. It's a bit slow (compared to the tabulate-command) with big datasets because of the use of preserve, contract and restore, but produces what I was looking for.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: AW: st: bottom to top or reverse cumulative distribution in table command?

Maarten buis
In reply to this post by Stefan.Gawrich
--- On Wed, 20/1/10, [hidden email] wrote:
> - reversecum works fine. It's a bit slow (compared to the
> tabulate-command)

-tabulate- is a built-in command (i.e. it is part of the
executable, not written in ado or mata), so it is unlikely
that any user written command can improve on that.

-- Maarten

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------


     

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: st: bottom to top or reverse cumulative distribution in table command?

Nick Cox
In reply to this post by Nick Cox
Thanks to Kit Baum, a revised version of -groups- with an extra -ge- option has been placed on SSC. I believe that option suits Stefan's problem.

Stata 8 is required.

For an overview without installation:

. ssc type groups.hlp

For installation or re-installation, use -ssc- or -adoupdate-.

As someone, some day, may want cumulatives to be "less than" rather than "less than or equal to" I will shortly add an -lt- option.

Past Fortran (or FORTRAN) users may enjoy a brief moment of nostalgia.

Nick
[hidden email]

Nick Cox

I'll look at -groups- to see if this functionality can be added.

[hidden email]

Thanks to Martin, Maarten and Nick for your help!

There are many ways to calculate reverse cumulative distributions in a do file and I already have used that. What I was looking for was a quick and easy solution. I often have people on the phone asking for results and it occurred to me that I couldn't produce this kind of data on the fly (calculation in the head is one option but may be error-prone while talking on the phone).

- "cumul, equal" works, but requires some additional programming.

- "groups, show(rvpercent)" is very flexible but - as far as I see  - not exactly what I wanted. Options like RVpercent, rpercent or Rpercent produce reverse cumulative distributions of the type "percentage of cases bigger than x" , not "bigger than or equal to x". It works but for "equal or bigger" one has to pick the result from the previous line.

- reversecum works fine. It's a bit slow (compared to the tabulate-command) with big datasets because of the use of preserve, contract and restore, but produces what I was looking for.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: AW: st: bottom to top or reverse cumulative distribution in table command?

Stefan.Gawrich
In reply to this post by Maarten buis

Thanks Nick for the "ge" addition to 'groups', works great.

Maarten Buis wrote:
-tabulate- is a built-in command (i.e. it is part of the
executable, not written in ado or mata), so it is unlikely
that any user written command can improve on that.


That's right and I didn't expect that.
What I meant to write is that calculations can take significant time in big datasets.


Performance example (Dual Core PC, 2 Gig Ram, Stata 11 MP):

1. One variable in a dataset

set rmsg on
set obs 2000000
gen x = int(uniform() * 10)

tab x // 0.44 sec.
groups x,show(freq rpercent) ge // 12.8 sec.
reversecum x // 5.0 sec.


2. 20 variables in a dataset

set obs 2000000
forval x = 1/20 {
gen x`x' = int(uniform() * 10)
}
tab x1 // 0,44 sec.
groups x1,show(freq rpercent) ge // 14.2 sec.
reversecum x1 // 29.2 sec.



BTW: The first time (0.44 sec.) reminds me of my first Stata experience a decade ago.
I worked with SPSS at university before and couldn't believe how fast Stata was.
(instant tabulate results...wow)


Thanks again

Stefan



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: st: bottom to top or reverse cumulative distribution in table command?

Nick Cox
In reply to this post by Nick Cox
Thanks again to Kit, a version with the extra -lt- option is now on SSC.
Please install or re-install using -ssc- or -adoupdate- if interested.

The nub of the matter is best shown by an example:

. sysuse auto

. groups rep78, show(F f Rf) lt

  +---------------------------+
  | rep78   # <   Freq.   # > |
  |---------------------------|
  |     1     0       2    67 |
  |     2     2       8    59 |
  |     3    10      30    29 |
  |     4    40      18    11 |
  |     5    58      11     0 |
  +---------------------------+

. groups rep78, show(F f Rf) ge

  +-----------------------------+
  | rep78   # <=   Freq.   # >= |
  |-----------------------------|
  |     1      2       2     69 |
  |     2     10       8     67 |
  |     3     40      30     59 |
  |     4     58      18     29 |
  |     5     69      11     11 |
  +-----------------------------+

Nick
[hidden email]

Nick Cox

Thanks to Kit Baum, a revised version of -groups- with an extra -ge- option has been placed on SSC. I believe that option suits Stefan's problem.

Stata 8 is required.

For an overview without installation:

. ssc type groups.hlp

For installation or re-installation, use -ssc- or -adoupdate-.

As someone, some day, may want cumulatives to be "less than" rather than "less than or equal to" I will shortly add an -lt- option.

Past Fortran (or FORTRAN) users may enjoy a brief moment of nostalgia.

Nick
[hidden email]

Nick Cox

I'll look at -groups- to see if this functionality can be added.

[hidden email]

Thanks to Martin, Maarten and Nick for your help!

There are many ways to calculate reverse cumulative distributions in a do file and I already have used that. What I was looking for was a quick and easy solution. I often have people on the phone asking for results and it occurred to me that I couldn't produce this kind of data on the fly (calculation in the head is one option but may be error-prone while talking on the phone).

- "cumul, equal" works, but requires some additional programming.

- "groups, show(rvpercent)" is very flexible but - as far as I see  - not exactly what I wanted. Options like RVpercent, rpercent or Rpercent produce reverse cumulative distributions of the type "percentage of cases bigger than x" , not "bigger than or equal to x". It works but for "equal or bigger" one has to pick the result from the previous line.

- reversecum works fine. It's a bit slow (compared to the tabulate-command) with big datasets because of the use of preserve, contract and restore, but produces what I was looking for.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

st: New rules

Stefan.Gawrich
In reply to this post by Stefan.Gawrich
Marcello Pagano wrote:
New subscribers will have to go through a screening process that will add time, hopefully less than a day, before they can subscribe. The screening process is designed to keep some, hopefully very few, people out.

I'm a frequent reader of the Statalist archive but an unfrequent writer. I can't handle all this mails on my work account so I only subscribe temporarily to ask or contribute something.

I don't want to cause any trouble or work with this behaviour. It would be nice to have quick subscription for people already known to majordomo if possible.


Stefan Gawrich
Dillenburg
Germany


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
123456
Loading...