Transposing datasets

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Transposing datasets

Kaspar Dardas-2
Hi guys,

I have a dataset with about 32000 observations, which is in long
format (see structure below). gvkey is the identifier for a firm
(about 600 different firms), datadate is the monthend value between
2002 and 2010, which of course repeats in the dataset (again, long
format) and mcap_sum is my observation, which is different for each
month and gvkey.

gvkey datadate mcap_sum
212782 30jun2005 4946.9
212782 31jul2005 5042.1
212782 31aug2005 5145
212782 30sep2005 5302.5
212782 31oct2005 5253.5
212782 30nov2005 5642.7
212782 31dec2005 6230
etc...

Well, I would like to transpose my dataset so it shows each month as a
variable and the observations are mcap_sums. My tries with reshape
failed miserably. (xpose wont work because I still want to keep
mcap_sum as an observation).. Does anybody has a suggestion to solve
this quickly?

gvkey    31dec2005   30nov2005   31oct2005
212782  6230              5642.7           5253.5   ...........


Best,

Kaspar
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|

Re: Transposing datasets

nshephard
Administrator
On Mon, Aug 2, 2010 at 11:37 AM, Kaspar Dardas
<[hidden email]> wrote:

> I have a dataset with about 32000 observations, which is in long
> format (see structure below). gvkey is the identifier for a firm
> (about 600 different firms), datadate is the monthend value between
> 2002 and 2010, which of course repeats in the dataset (again, long
> format) and mcap_sum is my observation, which is different for each
> month and gvkey.
>
> gvkey   datadate        mcap_sum
> 212782  30jun2005       4946.9
> 212782  31jul2005       5042.1
> 212782  31aug2005       5145
> 212782  30sep2005       5302.5
> 212782  31oct2005       5253.5
> 212782  30nov2005       5642.7
> 212782  31dec2005       6230
> etc...
>
> Well, I would like to transpose my dataset so it shows each month as a
> variable and the observations are mcap_sums. My tries with reshape
> failed miserably.

Its useful and informative to post the code that "failed miserably"
and the output that is associated with it, that way people can see
what you've been trying and point out where you've gone wrong (all
recommended as good practice in the Statalist FAQ, linked from the
bottom of every message to the list).

In the absence of this I'd suggest

reshape wide mcap_sum, i(gvkey) j(datadate)

...but this may be complicated by the format of 'datadate' if it is
formatted as a date variable (as underneath the formatting which
presents it as you've listed it will be stored as a numeric of sorts).

> (xpose wont work because I still want to keep
> mcap_sum as an observation).. Does anybody has a suggestion to solve
> this quickly?
>
> gvkey    31dec2005   30nov2005   31oct2005
> 212782  6230              5642.7           5253.5   ...........

You'll never achieve these names as Stata variables are not allowed to
start with numbers.  -renvars- (install with -ssc install renvars-)
may be of use in renaming variables here though.

And whats urgent for you isn't likely to be for anyone else on the list ;-)

Neil

--
"... no scientific worker has a fixed level of significance at which
from year to year, and in all circumstances, he rejects hypotheses; he
rather gives his mind to each particular case in the light of his
evidence and his ideas." - Sir Ronald A. Fisher (1956)

Email - [hidden email]
Website - http://slack.ser.man.ac.uk/
Photos - http://www.flickr.com/photos/slackline/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|

AW: Transposing datasets

Martin Weiss-5
In reply to this post by Kaspar Dardas-2

<>



*************
clear*

inp gvkey str10 datadate mcap_sum
212782 30jun2005 4946.9
212782 31jul2005 5042.1
212782 31aug2005 5145
212782 30sep2005 5302.5
212782 31oct2005 5253.5
212782 30nov2005 5642.7
212782 31dec2005 6230  
212783 30jun2005 496.9
212783 31jul2005 502.1
212783 31aug2005 515
212783 30sep2005 530.5
212783 31oct2005 523.5
212783 30nov2005 562.7
212783 31dec2005 630  
end

li, noo sepby(gvkey)

reshape wide mcap, i(gvkey) j(datadate) string

foreach var of varlist mcap_sum*{
        ren `var' `=subinstr("`var'", "mcap_sum", "date_", 1)'
}

l

*************



HTH
Martin

-----Ursprüngliche Nachricht-----
Von: [hidden email] [mailto:[hidden email]] Im Auftrag von Kaspar Dardas
Gesendet: Montag, 2. August 2010 13:37
An: [hidden email]
Betreff: st: Transposing datasets

Hi guys,

I have a dataset with about 32000 observations, which is in long
format (see structure below). gvkey is the identifier for a firm
(about 600 different firms), datadate is the monthend value between
2002 and 2010, which of course repeats in the dataset (again, long
format) and mcap_sum is my observation, which is different for each
month and gvkey.

gvkey datadate mcap_sum
212782 30jun2005 4946.9
212782 31jul2005 5042.1
212782 31aug2005 5145
212782 30sep2005 5302.5
212782 31oct2005 5253.5
212782 30nov2005 5642.7
212782 31dec2005 6230
etc...

Well, I would like to transpose my dataset so it shows each month as a
variable and the observations are mcap_sums. My tries with reshape
failed miserably. (xpose wont work because I still want to keep
mcap_sum as an observation).. Does anybody has a suggestion to solve
this quickly?

gvkey    31dec2005   30nov2005   31oct2005
212782  6230              5642.7           5253.5   ...........


Best,

Kaspar
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|

AW: st: Transposing datasets

Martin Weiss-5
In reply to this post by nshephard

<>


"And whats urgent for you isn't likely to be for anyone else on the list
;-)"


Re histrionics, see NJC`s
http://www.stata.com/statalist/archive/2008-04/msg01286.html



"You'll never achieve these names as Stata variables are not allowed to
start with numbers."


-help f_strtoname- comes to mind...



HTH
Martin


-----Ursprüngliche Nachricht-----
Von: [hidden email]
[mailto:[hidden email]] Im Auftrag von Neil Shephard
Gesendet: Montag, 2. August 2010 13:48
An: [hidden email]
Betreff: Re: st: Transposing datasets

On Mon, Aug 2, 2010 at 11:37 AM, Kaspar Dardas
<[hidden email]> wrote:

> I have a dataset with about 32000 observations, which is in long
> format (see structure below). gvkey is the identifier for a firm
> (about 600 different firms), datadate is the monthend value between
> 2002 and 2010, which of course repeats in the dataset (again, long
> format) and mcap_sum is my observation, which is different for each
> month and gvkey.
>
> gvkey   datadate        mcap_sum
> 212782  30jun2005       4946.9
> 212782  31jul2005       5042.1
> 212782  31aug2005       5145
> 212782  30sep2005       5302.5
> 212782  31oct2005       5253.5
> 212782  30nov2005       5642.7
> 212782  31dec2005       6230
> etc...
>
> Well, I would like to transpose my dataset so it shows each month as a
> variable and the observations are mcap_sums. My tries with reshape
> failed miserably.

Its useful and informative to post the code that "failed miserably"
and the output that is associated with it, that way people can see
what you've been trying and point out where you've gone wrong (all
recommended as good practice in the Statalist FAQ, linked from the
bottom of every message to the list).

In the absence of this I'd suggest

reshape wide mcap_sum, i(gvkey) j(datadate)

...but this may be complicated by the format of 'datadate' if it is
formatted as a date variable (as underneath the formatting which
presents it as you've listed it will be stored as a numeric of sorts).

> (xpose wont work because I still want to keep
> mcap_sum as an observation).. Does anybody has a suggestion to solve
> this quickly?
>
> gvkey    31dec2005   30nov2005   31oct2005
> 212782  6230              5642.7           5253.5   ...........

You'll never achieve these names as Stata variables are not allowed to
start with numbers.  -renvars- (install with -ssc install renvars-)
may be of use in renaming variables here though.

And whats urgent for you isn't likely to be for anyone else on the list ;-)

Neil

--
"... no scientific worker has a fixed level of significance at which
from year to year, and in all circumstances, he rejects hypotheses; he
rather gives his mind to each particular case in the light of his
evidence and his ideas." - Sir Ronald A. Fisher (1956)

Email - [hidden email]
Website - http://slack.ser.man.ac.uk/
Photos - http://www.flickr.com/photos/slackline/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|

RE: Transposing datasets

Eric A. Booth
In reply to this post by Kaspar Dardas-2
<>

You can't name a new variable with a number as the first character (e.g., "31jun1980").  So, -tostring- your datadate var first:

*********!
clear
inp gvkey   str20(datadate)        mcap_sum
212782  30jun2005       4946.9
212782  31jul2005       5042.1
212782  31aug2005       5145
212782  30sep2005       5302.5
212782  31oct2005       5253.5
212782  30nov2005       5642.7
212782  31dec2005       6230
end
**set up data**
g datadate2 = date(datadate, "DMY")
format datadate2 %td
drop datadate
rename datadate2 datadate


//1. make date a string var//
tostring datadate, force replace u


//2.  reshape wide using datadate//
 g id = _n
reshape wide mcap_sum, i(id) j(datadate) string
ds mcap_*
//3. move all obs for a gvkey to one line//
foreach v in `r(varlist)' {
        bys gvkey: egen `v'2 = max(`v')
        drop `v'
        rename `v'2 `v'
          }
        by gvkey: g o = 1 ==_n
        keep if o==1
        drop o
       
*********!
 ~  Eric
____________
______________________________
________________________________________
From: [hidden email] [[hidden email]] on behalf of Kaspar Dardas [[hidden email]]
Sent: Monday, August 02, 2010 6:37 AM
To: [hidden email]
Subject: st: Transposing datasets

Hi guys,

I have a dataset with about 32000 observations, which is in long
format (see structure below). gvkey is the identifier for a firm
(about 600 different firms), datadate is the monthend value between
2002 and 2010, which of course repeats in the dataset (again, long
format) and mcap_sum is my observation, which is different for each
month and gvkey.

gvkey   datadate        mcap_sum
212782  30jun2005       4946.9
212782  31jul2005       5042.1
212782  31aug2005       5145
212782  30sep2005       5302.5
212782  31oct2005       5253.5
212782  30nov2005       5642.7
212782  31dec2005       6230
etc...

Well, I would like to transpose my dataset so it shows each month as a
variable and the observations are mcap_sums. My tries with reshape
failed miserably. (xpose wont work because I still want to keep
mcap_sum as an observation).. Does anybody has a suggestion to solve
this quickly?

gvkey    31dec2005   30nov2005   31oct2005
212782  6230              5642.7           5253.5   ...........


Best,

Kaspar
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|

AW: RE: Transposing datasets

Martin Weiss-5

<>

Why do we need another "id" variable? In Eric`s code, it is created via

*************
g id = _n
*************

Is "gvkey" not supposed to be the "id" variable?


HTH
Martin


-----Ursprüngliche Nachricht-----
Von: [hidden email]
[mailto:[hidden email]] Im Auftrag von Eric Booth
Gesendet: Montag, 2. August 2010 14:51
An: [hidden email]
Betreff: st: RE: Transposing datasets

<>

You can't name a new variable with a number as the first character (e.g.,
"31jun1980").  So, -tostring- your datadate var first:

*********!
clear
inp gvkey   str20(datadate)        mcap_sum
212782  30jun2005       4946.9
212782  31jul2005       5042.1
212782  31aug2005       5145
212782  30sep2005       5302.5
212782  31oct2005       5253.5
212782  30nov2005       5642.7
212782  31dec2005       6230
end
**set up data**
g datadate2 = date(datadate, "DMY")
format datadate2 %td
drop datadate
rename datadate2 datadate


//1. make date a string var//
tostring datadate, force replace u


//2.  reshape wide using datadate//
 g id = _n
reshape wide mcap_sum, i(id) j(datadate) string
ds mcap_*
//3. move all obs for a gvkey to one line//
foreach v in `r(varlist)' {
        bys gvkey: egen `v'2 = max(`v')
        drop `v'
        rename `v'2 `v'
          }
        by gvkey: g o = 1 ==_n
        keep if o==1
        drop o
       
*********!
 ~  Eric
____________
______________________________
________________________________________
From: [hidden email]
[[hidden email]] on behalf of Kaspar Dardas
[[hidden email]]
Sent: Monday, August 02, 2010 6:37 AM
To: [hidden email]
Subject: st: Transposing datasets

Hi guys,

I have a dataset with about 32000 observations, which is in long
format (see structure below). gvkey is the identifier for a firm
(about 600 different firms), datadate is the monthend value between
2002 and 2010, which of course repeats in the dataset (again, long
format) and mcap_sum is my observation, which is different for each
month and gvkey.

gvkey   datadate        mcap_sum
212782  30jun2005       4946.9
212782  31jul2005       5042.1
212782  31aug2005       5145
212782  30sep2005       5302.5
212782  31oct2005       5253.5
212782  30nov2005       5642.7
212782  31dec2005       6230
etc...

Well, I would like to transpose my dataset so it shows each month as a
variable and the observations are mcap_sums. My tries with reshape
failed miserably. (xpose wont work because I still want to keep
mcap_sum as an observation).. Does anybody has a suggestion to solve
this quickly?

gvkey    31dec2005   30nov2005   31oct2005
212782  6230              5642.7           5253.5   ...........


Best,

Kaspar
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|

RE: AW: RE: Transposing datasets

Eric A. Booth
<>

I misread in the OP what gvkey was...you're right, there's no need for "id".  My post was delayed--I had sent my email before yours came through-- so I hadn't intended for mine to be some kind of comment/alternate to your ppost, as yours was clearly better.   ( Also, I learned something about using subinstr() in the rename command from your post, thanks )

~ Eric
________________________________________________________________________________
________________________________________________________________________________
From: [hidden email] [[hidden email]] on behalf of Martin Weiss [[hidden email]]
Sent: Monday, August 02, 2010 8:02 AM
To: [hidden email]
Subject: st: AW: RE: Transposing datasets

<>

Why do we need another "id" variable? In Eric`s code, it is created via

*************
g id = _n
*************

Is "gvkey" not supposed to be the "id" variable?


HTH
Martin


-----Ursprüngliche Nachricht-----
Von: [hidden email]
[mailto:[hidden email]] Im Auftrag von Eric Booth
Gesendet: Montag, 2. August 2010 14:51
An: [hidden email]
Betreff: st: RE: Transposing datasets

<>

You can't name a new variable with a number as the first character (e.g.,
"31jun1980").  So, -tostring- your datadate var first:

*********!
clear
inp gvkey   str20(datadate)        mcap_sum
212782  30jun2005       4946.9
212782  31jul2005       5042.1
212782  31aug2005       5145
212782  30sep2005       5302.5
212782  31oct2005       5253.5
212782  30nov2005       5642.7
212782  31dec2005       6230
end
**set up data**
g datadate2 = date(datadate, "DMY")
format datadate2 %td
drop datadate
rename datadate2 datadate


//1. make date a string var//
tostring datadate, force replace u


//2.  reshape wide using datadate//
 g id = _n
reshape wide mcap_sum, i(id) j(datadate) string
ds mcap_*
//3. move all obs for a gvkey to one line//
foreach v in `r(varlist)' {
        bys gvkey: egen `v'2 = max(`v')
        drop `v'
        rename `v'2 `v'
          }
        by gvkey: g o = 1 ==_n
        keep if o==1
        drop o

*********!
 ~  Eric
____________
______________________________
________________________________________
From: [hidden email]
[[hidden email]] on behalf of Kaspar Dardas
[[hidden email]]
Sent: Monday, August 02, 2010 6:37 AM
To: [hidden email]
Subject: st: Transposing datasets

Hi guys,

I have a dataset with about 32000 observations, which is in long
format (see structure below). gvkey is the identifier for a firm
(about 600 different firms), datadate is the monthend value between
2002 and 2010, which of course repeats in the dataset (again, long
format) and mcap_sum is my observation, which is different for each
month and gvkey.

gvkey   datadate        mcap_sum
212782  30jun2005       4946.9
212782  31jul2005       5042.1
212782  31aug2005       5145
212782  30sep2005       5302.5
212782  31oct2005       5253.5
212782  30nov2005       5642.7
212782  31dec2005       6230
etc...

Well, I would like to transpose my dataset so it shows each month as a
variable and the observations are mcap_sums. My tries with reshape
failed miserably. (xpose wont work because I still want to keep
mcap_sum as an observation).. Does anybody has a suggestion to solve
this quickly?

gvkey    31dec2005   30nov2005   31oct2005
212782  6230              5642.7           5253.5   ...........


Best,

Kaspar
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|

AW: RE: AW: RE: Transposing datasets

Martin Weiss-5

<>

"( Also, I learned something about using subinstr() in the rename command
from your post, thanks )"


Cheers! I love to work from first principles whenever possible, so my use of
-subinstr()- was not intended to detract from the appeal of NJC`s -findit
renvars-...



HTH
Martin


-----Ursprüngliche Nachricht-----
Von: [hidden email]
[mailto:[hidden email]] Im Auftrag von Eric Booth
Gesendet: Montag, 2. August 2010 15:11
An: [hidden email]
Betreff: st: RE: AW: RE: Transposing datasets

<>

I misread in the OP what gvkey was...you're right, there's no need for "id".
My post was delayed--I had sent my email before yours came through-- so I
hadn't intended for mine to be some kind of comment/alternate to your post,
as yours was clearly better.   ( Also, I learned something about using
subinstr() in the rename command from your post, thanks )

~ Eric
____________________________________________________________________________
____
____________________________________________________________________________
____
From: [hidden email]
[[hidden email]] on behalf of Martin Weiss
[[hidden email]]
Sent: Monday, August 02, 2010 8:02 AM
To: [hidden email]
Subject: st: AW: RE: Transposing datasets

<>

Why do we need another "id" variable? In Eric`s code, it is created via

*************
g id = _n
*************

Is "gvkey" not supposed to be the "id" variable?


HTH
Martin


-----Ursprüngliche Nachricht-----
Von: [hidden email]
[mailto:[hidden email]] Im Auftrag von Eric Booth
Gesendet: Montag, 2. August 2010 14:51
An: [hidden email]
Betreff: st: RE: Transposing datasets

<>

You can't name a new variable with a number as the first character (e.g.,
"31jun1980").  So, -tostring- your datadate var first:

*********!
clear
inp gvkey   str20(datadate)        mcap_sum
212782  30jun2005       4946.9
212782  31jul2005       5042.1
212782  31aug2005       5145
212782  30sep2005       5302.5
212782  31oct2005       5253.5
212782  30nov2005       5642.7
212782  31dec2005       6230
end
**set up data**
g datadate2 = date(datadate, "DMY")
format datadate2 %td
drop datadate
rename datadate2 datadate


//1. make date a string var//
tostring datadate, force replace u


//2.  reshape wide using datadate//
 g id = _n
reshape wide mcap_sum, i(id) j(datadate) string
ds mcap_*
//3. move all obs for a gvkey to one line//
foreach v in `r(varlist)' {
        bys gvkey: egen `v'2 = max(`v')
        drop `v'
        rename `v'2 `v'
          }
        by gvkey: g o = 1 ==_n
        keep if o==1
        drop o

*********!
 ~  Eric
____________
______________________________
________________________________________
From: [hidden email]
[[hidden email]] on behalf of Kaspar Dardas
[[hidden email]]
Sent: Monday, August 02, 2010 6:37 AM
To: [hidden email]
Subject: st: Transposing datasets

Hi guys,

I have a dataset with about 32000 observations, which is in long
format (see structure below). gvkey is the identifier for a firm
(about 600 different firms), datadate is the monthend value between
2002 and 2010, which of course repeats in the dataset (again, long
format) and mcap_sum is my observation, which is different for each
month and gvkey.

gvkey   datadate        mcap_sum
212782  30jun2005       4946.9
212782  31jul2005       5042.1
212782  31aug2005       5145
212782  30sep2005       5302.5
212782  31oct2005       5253.5
212782  30nov2005       5642.7
212782  31dec2005       6230
etc...

Well, I would like to transpose my dataset so it shows each month as a
variable and the observations are mcap_sums. My tries with reshape
failed miserably. (xpose wont work because I still want to keep
mcap_sum as an observation).. Does anybody has a suggestion to solve
this quickly?

gvkey    31dec2005   30nov2005   31oct2005
212782  6230              5642.7           5253.5   ...........


Best,

Kaspar
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|

AW: AW: RE: AW: RE: Transposing datasets

Martin Weiss-5

<>

Nick himself advocated "first principles" in
http://www.stata.com/statalist/archive/2010-05/msg01165.html, btw...



HTH
Martin


-----Ursprüngliche Nachricht-----
Von: [hidden email]
[mailto:[hidden email]] Im Auftrag von Martin Weiss
Gesendet: Montag, 2. August 2010 15:20
An: [hidden email]
Betreff: st: AW: RE: AW: RE: Transposing datasets


<>

"( Also, I learned something about using subinstr() in the rename command
from your post, thanks )"


Cheers! I love to work from first principles whenever possible, so my use of
-subinstr()- was not intended to detract from the appeal of NJC`s -findit
renvars-...



HTH
Martin


-----Ursprüngliche Nachricht-----
Von: [hidden email]
[mailto:[hidden email]] Im Auftrag von Eric Booth
Gesendet: Montag, 2. August 2010 15:11
An: [hidden email]
Betreff: st: RE: AW: RE: Transposing datasets

<>

I misread in the OP what gvkey was...you're right, there's no need for "id".
My post was delayed--I had sent my email before yours came through-- so I
hadn't intended for mine to be some kind of comment/alternate to your post,
as yours was clearly better.   ( Also, I learned something about using
subinstr() in the rename command from your post, thanks )

~ Eric
____________________________________________________________________________
____
____________________________________________________________________________
____
From: [hidden email]
[[hidden email]] on behalf of Martin Weiss
[[hidden email]]
Sent: Monday, August 02, 2010 8:02 AM
To: [hidden email]
Subject: st: AW: RE: Transposing datasets

<>

Why do we need another "id" variable? In Eric`s code, it is created via

*************
g id = _n
*************

Is "gvkey" not supposed to be the "id" variable?


HTH
Martin


-----Ursprüngliche Nachricht-----
Von: [hidden email]
[mailto:[hidden email]] Im Auftrag von Eric Booth
Gesendet: Montag, 2. August 2010 14:51
An: [hidden email]
Betreff: st: RE: Transposing datasets

<>

You can't name a new variable with a number as the first character (e.g.,
"31jun1980").  So, -tostring- your datadate var first:

*********!
clear
inp gvkey   str20(datadate)        mcap_sum
212782  30jun2005       4946.9
212782  31jul2005       5042.1
212782  31aug2005       5145
212782  30sep2005       5302.5
212782  31oct2005       5253.5
212782  30nov2005       5642.7
212782  31dec2005       6230
end
**set up data**
g datadate2 = date(datadate, "DMY")
format datadate2 %td
drop datadate
rename datadate2 datadate


//1. make date a string var//
tostring datadate, force replace u


//2.  reshape wide using datadate//
 g id = _n
reshape wide mcap_sum, i(id) j(datadate) string
ds mcap_*
//3. move all obs for a gvkey to one line//
foreach v in `r(varlist)' {
        bys gvkey: egen `v'2 = max(`v')
        drop `v'
        rename `v'2 `v'
          }
        by gvkey: g o = 1 ==_n
        keep if o==1
        drop o

*********!
 ~  Eric
____________
______________________________
________________________________________
From: [hidden email]
[[hidden email]] on behalf of Kaspar Dardas
[[hidden email]]
Sent: Monday, August 02, 2010 6:37 AM
To: [hidden email]
Subject: st: Transposing datasets

Hi guys,

I have a dataset with about 32000 observations, which is in long
format (see structure below). gvkey is the identifier for a firm
(about 600 different firms), datadate is the monthend value between
2002 and 2010, which of course repeats in the dataset (again, long
format) and mcap_sum is my observation, which is different for each
month and gvkey.

gvkey   datadate        mcap_sum
212782  30jun2005       4946.9
212782  31jul2005       5042.1
212782  31aug2005       5145
212782  30sep2005       5302.5
212782  31oct2005       5253.5
212782  30nov2005       5642.7
212782  31dec2005       6230
etc...

Well, I would like to transpose my dataset so it shows each month as a
variable and the observations are mcap_sums. My tries with reshape
failed miserably. (xpose wont work because I still want to keep
mcap_sum as an observation).. Does anybody has a suggestion to solve
this quickly?

gvkey    31dec2005   30nov2005   31oct2005
212782  6230              5642.7           5253.5   ...........


Best,

Kaspar
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|

RE: AW: AW: RE: AW: RE: Transposing datasets

Nick Cox
No arguments here.

Sometimes we all just want to get results quickly and accurately, and
care not a jot about logic. Sometimes understanding exactly what is
going on is the best prescription because it is all too likely that your
next such problem will not be identical and you will need to know how to
change the solution.

Nick
[hidden email]

Martin Weiss

Nick himself advocated "first principles" in
http://www.stata.com/statalist/archive/2010-05/msg01165.html, btw...

Martin Weiss

"( Also, I learned something about using subinstr() in the rename
command
from your post, thanks )"

Cheers! I love to work from first principles whenever possible, so my
use of
-subinstr()- was not intended to detract from the appeal of NJC`s
-findit
renvars-...

Eric Booth

( Also, I learned something about using
subinstr() in the rename command from your post, thanks )

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|

RE: AW: AW: RE: AW: RE: Transposing datasets

Kaspar Dardas-2
Hey,

thanks guys. Indeed, I have stored datadate in %td- format, which
caused my problems with the reshape command.
Now all works fine.

Best,

Kaspar





2010/8/2 Nick Cox <[hidden email]>:

> No arguments here.
>
> Sometimes we all just want to get results quickly and accurately, and
> care not a jot about logic. Sometimes understanding exactly what is
> going on is the best prescription because it is all too likely that your
> next such problem will not be identical and you will need to know how to
> change the solution.
>
> Nick
> [hidden email]
>
> Martin Weiss
>
> Nick himself advocated "first principles" in
> http://www.stata.com/statalist/archive/2010-05/msg01165.html, btw...
>
> Martin Weiss
>
> "( Also, I learned something about using subinstr() in the rename
> command
> from your post, thanks )"
>
> Cheers! I love to work from first principles whenever possible, so my
> use of
> -subinstr()- was not intended to detract from the appeal of NJC`s
> -findit
> renvars-...
>
> Eric Booth
>
> ( Also, I learned something about using
> subinstr() in the rename command from your post, thanks )
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|

AW: st: RE: AW: AW: RE: AW: RE: Transposing datasets

Martin Weiss-5

<>


" Indeed, I have stored datadate in %td- format, which
caused my problems with the reshape command."




The display -format- should not matter at all, I think, for -reshape- purposes. This code runs w/o problems:


*************
clear*

inp gvkey str10 datadate mcap_sum
212782 30jun2005 4946.9
212782 31jul2005 5042.1
212782 31aug2005 5145
212782 30sep2005 5302.5
212782 31oct2005 5253.5
212782 30nov2005 5642.7
212782 31dec2005 6230  
212783 30jun2005 496.9
212783 31jul2005 502.1
212783 31aug2005 515
212783 30sep2005 530.5
212783 31oct2005 523.5
212783 30nov2005 562.7
212783 31dec2005 630  
end

li, noo sepby(gvkey)

gen datadate2=date(datadate, "DMY")
form datadate2 %td

drop datadate

reshape wide mcap, i(gvkey) j(datadate2)

foreach var of varlist mcap_sum*{
        ren `var' `=subinstr("`var'", "mcap_sum", "date_", 1)'
}

l
*************



HTH
Martin


-----Ursprüngliche Nachricht-----
Von: [hidden email] [mailto:[hidden email]] Im Auftrag von Kaspar Dardas
Gesendet: Montag, 2. August 2010 15:44
An: [hidden email]
Betreff: Re: st: RE: AW: AW: RE: AW: RE: Transposing datasets

Hey,

thanks guys. Indeed, I have stored datadate in %td- format, which
caused my problems with the reshape command.
Now all works fine.

Best,

Kaspar





2010/8/2 Nick Cox <[hidden email]>:

> No arguments here.
>
> Sometimes we all just want to get results quickly and accurately, and
> care not a jot about logic. Sometimes understanding exactly what is
> going on is the best prescription because it is all too likely that your
> next such problem will not be identical and you will need to know how to
> change the solution.
>
> Nick
> [hidden email]
>
> Martin Weiss
>
> Nick himself advocated "first principles" in
> http://www.stata.com/statalist/archive/2010-05/msg01165.html, btw...
>
> Martin Weiss
>
> "( Also, I learned something about using subinstr() in the rename
> command
> from your post, thanks )"
>
> Cheers! I love to work from first principles whenever possible, so my
> use of
> -subinstr()- was not intended to detract from the appeal of NJC`s
> -findit
> renvars-...
>
> Eric Booth
>
> ( Also, I learned something about using
> subinstr() in the rename command from your post, thanks )
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|

RE: AW: AW: RE: AW: RE: Transposing datasets

Kaspar Dardas-2
In reply to this post by Kaspar Dardas-2
all right, I made reference to Eric's approach...

2010/8/2 Martin Weiss <[hidden email]>:

>
> <>
>
>
> " Indeed, I have stored datadate in %td- format, which
> caused my problems with the reshape command."
>
>
>
>
> The display -format- should not matter at all, I think, for -reshape- purposes. This code runs w/o problems:
>
>
> *************
> clear*
>
> inp gvkey str10 datadate mcap_sum
> 212782  30jun2005       4946.9
> 212782  31jul2005       5042.1
> 212782  31aug2005       5145
> 212782  30sep2005       5302.5
> 212782  31oct2005       5253.5
> 212782  30nov2005       5642.7
> 212782  31dec2005       6230
> 212783  30jun2005       496.9
> 212783  31jul2005       502.1
> 212783  31aug2005       515
> 212783  30sep2005       530.5
> 212783  31oct2005       523.5
> 212783  30nov2005       562.7
> 212783  31dec2005       630
> end
>
> li, noo sepby(gvkey)
>
> gen datadate2=date(datadate, "DMY")
> form datadate2 %td
>
> drop datadate
>
> reshape wide mcap, i(gvkey) j(datadate2)
>
> foreach var of varlist mcap_sum*{
>        ren `var' `=subinstr("`var'", "mcap_sum", "date_", 1)'
> }
>
> l
> *************
>
>
>
> HTH
> Martin
>
>
> -----Ursprüngliche Nachricht-----
> Von: [hidden email] [mailto:[hidden email]] Im Auftrag von Kaspar Dardas
> Gesendet: Montag, 2. August 2010 15:44
> An: [hidden email]
> Betreff: Re: st: RE: AW: AW: RE: AW: RE: Transposing datasets
>
> Hey,
>
> thanks guys. Indeed, I have stored datadate in %td- format, which
> caused my problems with the reshape command.
> Now all works fine.
>
> Best,
>
> Kaspar
>
>
>
>
>
> 2010/8/2 Nick Cox <[hidden email]>:
>> No arguments here.
>>
>> Sometimes we all just want to get results quickly and accurately, and
>> care not a jot about logic. Sometimes understanding exactly what is
>> going on is the best prescription because it is all too likely that your
>> next such problem will not be identical and you will need to know how to
>> change the solution.
>>
>> Nick
>> [hidden email]
>>
>> Martin Weiss
>>
>> Nick himself advocated "first principles" in
>> http://www.stata.com/statalist/archive/2010-05/msg01165.html, btw...
>>
>> Martin Weiss
>>
>> "( Also, I learned something about using subinstr() in the rename
>> command
>> from your post, thanks )"
>>
>> Cheers! I love to work from first principles whenever possible, so my
>> use of
>> -subinstr()- was not intended to detract from the appeal of NJC`s
>> -findit
>> renvars-...
>>
>> Eric Booth
>>
>> ( Also, I learned something about using
>> subinstr() in the rename command from your post, thanks )
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|

RE: Transposing datasets

Nick Cox
In reply to this post by nshephard
Thanks to Neil for the mention of -renvars- by Jeroen Weesie and myself.

However, -renvars- is to be downloaded from the Stata Journal site, not SSC.

If people are unclear about where my stuff may be found, a package -njc_stuff-, which _is_ to be found at SSC, is dedicated to that purpose.

Nick
[hidden email]

Neil Shephard replied to Kaspar Dardas

> I have a dataset with about 32000 observations, which is in long
> format (see structure below). gvkey is the identifier for a firm
> (about 600 different firms), datadate is the monthend value between
> 2002 and 2010, which of course repeats in the dataset (again, long
> format) and mcap_sum is my observation, which is different for each
> month and gvkey.
>
> gvkey   datadate        mcap_sum
> 212782  30jun2005       4946.9
> 212782  31jul2005       5042.1
> 212782  31aug2005       5145
> 212782  30sep2005       5302.5
> 212782  31oct2005       5253.5
> 212782  30nov2005       5642.7
> 212782  31dec2005       6230
> etc...
>
> Well, I would like to transpose my dataset so it shows each month as a
> variable and the observations are mcap_sums. My tries with reshape
> failed miserably.

It's useful and informative to post the code that "failed miserably"
and the output that is associated with it; that way people can see
what you've been trying and point out where you've gone wrong (all
recommended as good practice in the Statalist FAQ, linked from the
bottom of every message to the list).

In the absence of this I'd suggest

reshape wide mcap_sum, i(gvkey) j(datadate)

...but this may be complicated by the format of 'datadate' if it is
formatted as a date variable (as underneath the formatting which
presents it as you've listed it will be stored as a numeric of sorts).

> (xpose wont work because I still want to keep
> mcap_sum as an observation).. Does anybody has a suggestion to solve
> this quickly?
>
> gvkey    31dec2005   30nov2005   31oct2005
> 212782  6230              5642.7           5253.5   ...........

You'll never achieve these names as Stata variables are not allowed to
start with numbers.  -renvars- (install with -ssc install renvars-)
may be of use in renaming variables here though.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/