st: Data file restructure

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

st: Data file restructure

Michael Beets
Data transformation



I have a wide formatted data set (each row is an individual measured
over time) containing 15 measurements beginning at 1981 and ending at
2006 (example: m81, m82, m85, m86… m2006). At the initial year of
measure (1981) the age range of individuals spanned 16 to 24yrs-old.
The age of each individual at each measurement occasion is in 15 age
variables (example: a81, a82, a85, a86… a2006).



What I want to do is restructure the dataset so age (in years) is the
variable with the measurement value listed for each individual.



Here is an example of what I think this would look like:



Current dataset:

Id          m81          m82          m85          a81          a82          a85

1           1.5           1.75           2                16
 17            20

2          2.3           2.5            2.6               17
18            21

3          1.8           2              2.3                24
 25            28





Restructured dataset

Id            y16         y17         y18         ….           Y20
    y21         y22         y23         y24         y25         …
       y28

1              1.5          1.75                                        2

2                            2.3          2.5
                2.6

3
                                                         1.8
2                              2.3



I've attempted to use "reshape" but this does not seem to give me the
data structure I am after. I also attempted to write some code using
"gen y16 = m81 if a81==16", but this cumbersome and subject to error.
Is there an easier set of commands to restructure the data as
indicated above?

Thanks.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|

Re: st: Data file restructure

Scott Merryman
Something like this?

clear
input id m81 m82 m85 a81 a82 a85
1 1.5 1.75 2   16 17 20
2 2.3 2.5  2.6 17 18 21
3 1.8 2    2.3 24 25 28
end
reshape long m a, i(id) j(year)
l
drop year
rename m y
reshape wide  y , i(id ) j(a)
l

Scott


On Mon, Dec 8, 2008 at 7:25 AM, Michael Beets <[hidden email]> wrote:

> Data transformation
>
>
>
> I have a wide formatted data set (each row is an individual measured
> over time) containing 15 measurements beginning at 1981 and ending at
> 2006 (example: m81, m82, m85, m86… m2006). At the initial year of
> measure (1981) the age range of individuals spanned 16 to 24yrs-old.
> The age of each individual at each measurement occasion is in 15 age
> variables (example: a81, a82, a85, a86… a2006).
>
>
>
> What I want to do is restructure the dataset so age (in years) is the
> variable with the measurement value listed for each individual.
>
>
>
> Here is an example of what I think this would look like:
>
>
>
> Current dataset:
>
> Id          m81          m82          m85          a81          a82          a85
>
> 1           1.5           1.75           2                16
>  17            20
>
> 2          2.3           2.5            2.6               17
> 18            21
>
> 3          1.8           2              2.3                24
>  25            28
>
>
>
>
>
> Restructured dataset
>
> Id            y16         y17         y18         ….           Y20
>    y21         y22         y23         y24         y25         …
>       y28
>
> 1              1.5          1.75                                        2
>
> 2                            2.3          2.5
>                2.6
>
> 3
>                                                         1.8
> 2                              2.3
>
>
>
> I've attempted to use "reshape" but this does not seem to give me the
> data structure I am after. I also attempted to write some code using
> "gen y16 = m81 if a81==16", but this cumbersome and subject to error.
> Is there an easier set of commands to restructure the data as
> indicated above?
>
> Thanks.
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|

RE: st: Data file restructure

Nick Cox
Scott's solution could be followed by

collapse y*, by(id)

Two more general comments:

1. Michael wants to go from a wide structure to another wide structure.
Thus that implies (at least) two -reshape-s, as Scott's code makes
clear.
The FAQ

FAQ     . . . . . . . . . . . . . . . . . . . . . . . .  Problems with
reshape
        12/03   I am having problems with the reshape command. Can
                you give further guidance?
                http://www.stata.com/support/faqs/data/reshape3.html

has more examples of similar kind.

2. Although Michael can get what he wants, for most purposes the data
structure he asks for is, at least in Stata, much more awkward than the
equivalent long structure.

Nick
[hidden email]

Scott Merryman

Something like this?

clear
input id m81 m82 m85 a81 a82 a85
1 1.5 1.75 2   16 17 20
2 2.3 2.5  2.6 17 18 21
3 1.8 2    2.3 24 25 28
end
reshape long m a, i(id) j(year)
l
drop year
rename m y
reshape wide  y , i(id ) j(a)
l

Scott


On Mon, Dec 8, 2008 at 7:25 AM, Michael Beets <[hidden email]>
wrote:

> Data transformation
>
>
>
> I have a wide formatted data set (each row is an individual measured
> over time) containing 15 measurements beginning at 1981 and ending at
> 2006 (example: m81, m82, m85, m86... m2006). At the initial year of
> measure (1981) the age range of individuals spanned 16 to 24yrs-old.
> The age of each individual at each measurement occasion is in 15 age
> variables (example: a81, a82, a85, a86... a2006).
>
>
>
> What I want to do is restructure the dataset so age (in years) is the
> variable with the measurement value listed for each individual.
>
>
>
> Here is an example of what I think this would look like:
>
>
>
> Current dataset:
>
> Id          m81          m82          m85          a81          a82
a85

>
> 1           1.5           1.75           2                16
>  17            20
>
> 2          2.3           2.5            2.6               17
> 18            21
>
> 3          1.8           2              2.3                24
>  25            28
>
>
>
>
>
> Restructured dataset
>
> Id            y16         y17         y18         ....           Y20
>    y21         y22         y23         y24         y25         ...
>       y28
>
> 1              1.5          1.75
2

>
> 2                            2.3          2.5
>                2.6
>
> 3
>                                                         1.8
> 2                              2.3
>
>
>
> I've attempted to use "reshape" but this does not seem to give me the
> data structure I am after. I also attempted to write some code using
> "gen y16 = m81 if a81==16", but this cumbersome and subject to error.
> Is there an easier set of commands to restructure the data as
> indicated above?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/