

Hi guys,
I have a dataset with about 32000 observations, which is in long
format (see structure below). gvkey is the identifier for a firm
(about 600 different firms), datadate is the monthend value between
2002 and 2010, which of course repeats in the dataset (again, long
format) and mcap_sum is my observation, which is different for each
month and gvkey.
gvkey datadate mcap_sum
212782 30jun2005 4946.9
212782 31jul2005 5042.1
212782 31aug2005 5145
212782 30sep2005 5302.5
212782 31oct2005 5253.5
212782 30nov2005 5642.7
212782 31dec2005 6230
etc...
Well, I would like to transpose my dataset so it shows each month as a
variable and the observations are mcap_sums. My tries with reshape
failed miserably. (xpose wont work because I still want to keep
mcap_sum as an observation).. Does anybody has a suggestion to solve
this quickly?
gvkey 31dec2005 30nov2005 31oct2005
212782 6230 5642.7 5253.5 ...........
Best,
Kaspar
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/

Administrator

On Mon, Aug 2, 2010 at 11:37 AM, Kaspar Dardas
< [hidden email]> wrote:
> I have a dataset with about 32000 observations, which is in long
> format (see structure below). gvkey is the identifier for a firm
> (about 600 different firms), datadate is the monthend value between
> 2002 and 2010, which of course repeats in the dataset (again, long
> format) and mcap_sum is my observation, which is different for each
> month and gvkey.
>
> gvkey datadate mcap_sum
> 212782 30jun2005 4946.9
> 212782 31jul2005 5042.1
> 212782 31aug2005 5145
> 212782 30sep2005 5302.5
> 212782 31oct2005 5253.5
> 212782 30nov2005 5642.7
> 212782 31dec2005 6230
> etc...
>
> Well, I would like to transpose my dataset so it shows each month as a
> variable and the observations are mcap_sums. My tries with reshape
> failed miserably.
Its useful and informative to post the code that "failed miserably"
and the output that is associated with it, that way people can see
what you've been trying and point out where you've gone wrong (all
recommended as good practice in the Statalist FAQ, linked from the
bottom of every message to the list).
In the absence of this I'd suggest
reshape wide mcap_sum, i(gvkey) j(datadate)
...but this may be complicated by the format of 'datadate' if it is
formatted as a date variable (as underneath the formatting which
presents it as you've listed it will be stored as a numeric of sorts).
> (xpose wont work because I still want to keep
> mcap_sum as an observation).. Does anybody has a suggestion to solve
> this quickly?
>
> gvkey 31dec2005 30nov2005 31oct2005
> 212782 6230 5642.7 5253.5 ...........
You'll never achieve these names as Stata variables are not allowed to
start with numbers. renvars (install with ssc install renvars)
may be of use in renaming variables here though.
And whats urgent for you isn't likely to be for anyone else on the list ;)
Neil

"... no scientific worker has a fixed level of significance at which
from year to year, and in all circumstances, he rejects hypotheses; he
rather gives his mind to each particular case in the light of his
evidence and his ideas."  Sir Ronald A. Fisher (1956)
Email  [hidden email]
Website  http://slack.ser.man.ac.uk/Photos  http://www.flickr.com/photos/slackline/*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


<>
*************
clear*
inp gvkey str10 datadate mcap_sum
212782 30jun2005 4946.9
212782 31jul2005 5042.1
212782 31aug2005 5145
212782 30sep2005 5302.5
212782 31oct2005 5253.5
212782 30nov2005 5642.7
212782 31dec2005 6230
212783 30jun2005 496.9
212783 31jul2005 502.1
212783 31aug2005 515
212783 30sep2005 530.5
212783 31oct2005 523.5
212783 30nov2005 562.7
212783 31dec2005 630
end
li, noo sepby(gvkey)
reshape wide mcap, i(gvkey) j(datadate) string
foreach var of varlist mcap_sum*{
ren `var' `=subinstr("`var'", "mcap_sum", "date_", 1)'
}
l
*************
HTH
Martin
Ursprüngliche Nachricht
Von: [hidden email] [mailto: [hidden email]] Im Auftrag von Kaspar Dardas
Gesendet: Montag, 2. August 2010 13:37
An: [hidden email]
Betreff: st: Transposing datasets
Hi guys,
I have a dataset with about 32000 observations, which is in long
format (see structure below). gvkey is the identifier for a firm
(about 600 different firms), datadate is the monthend value between
2002 and 2010, which of course repeats in the dataset (again, long
format) and mcap_sum is my observation, which is different for each
month and gvkey.
gvkey datadate mcap_sum
212782 30jun2005 4946.9
212782 31jul2005 5042.1
212782 31aug2005 5145
212782 30sep2005 5302.5
212782 31oct2005 5253.5
212782 30nov2005 5642.7
212782 31dec2005 6230
etc...
Well, I would like to transpose my dataset so it shows each month as a
variable and the observations are mcap_sums. My tries with reshape
failed miserably. (xpose wont work because I still want to keep
mcap_sum as an observation).. Does anybody has a suggestion to solve
this quickly?
gvkey 31dec2005 30nov2005 31oct2005
212782 6230 5642.7 5253.5 ...........
Best,
Kaspar
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


<>
"And whats urgent for you isn't likely to be for anyone else on the list
;)"
Re histrionics, see NJC`s
http://www.stata.com/statalist/archive/200804/msg01286.html"You'll never achieve these names as Stata variables are not allowed to
start with numbers."
help f_strtoname comes to mind...
HTH
Martin
Ursprüngliche Nachricht
Von: [hidden email]
[mailto: [hidden email]] Im Auftrag von Neil Shephard
Gesendet: Montag, 2. August 2010 13:48
An: [hidden email]
Betreff: Re: st: Transposing datasets
On Mon, Aug 2, 2010 at 11:37 AM, Kaspar Dardas
< [hidden email]> wrote:
> I have a dataset with about 32000 observations, which is in long
> format (see structure below). gvkey is the identifier for a firm
> (about 600 different firms), datadate is the monthend value between
> 2002 and 2010, which of course repeats in the dataset (again, long
> format) and mcap_sum is my observation, which is different for each
> month and gvkey.
>
> gvkey datadate mcap_sum
> 212782 30jun2005 4946.9
> 212782 31jul2005 5042.1
> 212782 31aug2005 5145
> 212782 30sep2005 5302.5
> 212782 31oct2005 5253.5
> 212782 30nov2005 5642.7
> 212782 31dec2005 6230
> etc...
>
> Well, I would like to transpose my dataset so it shows each month as a
> variable and the observations are mcap_sums. My tries with reshape
> failed miserably.
Its useful and informative to post the code that "failed miserably"
and the output that is associated with it, that way people can see
what you've been trying and point out where you've gone wrong (all
recommended as good practice in the Statalist FAQ, linked from the
bottom of every message to the list).
In the absence of this I'd suggest
reshape wide mcap_sum, i(gvkey) j(datadate)
...but this may be complicated by the format of 'datadate' if it is
formatted as a date variable (as underneath the formatting which
presents it as you've listed it will be stored as a numeric of sorts).
> (xpose wont work because I still want to keep
> mcap_sum as an observation).. Does anybody has a suggestion to solve
> this quickly?
>
> gvkey 31dec2005 30nov2005 31oct2005
> 212782 6230 5642.7 5253.5 ...........
You'll never achieve these names as Stata variables are not allowed to
start with numbers. renvars (install with ssc install renvars)
may be of use in renaming variables here though.
And whats urgent for you isn't likely to be for anyone else on the list ;)
Neil

"... no scientific worker has a fixed level of significance at which
from year to year, and in all circumstances, he rejects hypotheses; he
rather gives his mind to each particular case in the light of his
evidence and his ideas."  Sir Ronald A. Fisher (1956)
Email  [hidden email]
Website  http://slack.ser.man.ac.uk/Photos  http://www.flickr.com/photos/slackline/*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


<>
You can't name a new variable with a number as the first character (e.g., "31jun1980"). So, tostring your datadate var first:
*********!
clear
inp gvkey str20(datadate) mcap_sum
212782 30jun2005 4946.9
212782 31jul2005 5042.1
212782 31aug2005 5145
212782 30sep2005 5302.5
212782 31oct2005 5253.5
212782 30nov2005 5642.7
212782 31dec2005 6230
end
**set up data**
g datadate2 = date(datadate, "DMY")
format datadate2 %td
drop datadate
rename datadate2 datadate
//1. make date a string var//
tostring datadate, force replace u
//2. reshape wide using datadate//
g id = _n
reshape wide mcap_sum, i(id) j(datadate) string
ds mcap_*
//3. move all obs for a gvkey to one line//
foreach v in `r(varlist)' {
bys gvkey: egen `v'2 = max(`v')
drop `v'
rename `v'2 `v'
}
by gvkey: g o = 1 ==_n
keep if o==1
drop o
*********!
~ Eric
____________
______________________________
________________________________________
From: [hidden email] [ [hidden email]] on behalf of Kaspar Dardas [ [hidden email]]
Sent: Monday, August 02, 2010 6:37 AM
To: [hidden email]
Subject: st: Transposing datasets
Hi guys,
I have a dataset with about 32000 observations, which is in long
format (see structure below). gvkey is the identifier for a firm
(about 600 different firms), datadate is the monthend value between
2002 and 2010, which of course repeats in the dataset (again, long
format) and mcap_sum is my observation, which is different for each
month and gvkey.
gvkey datadate mcap_sum
212782 30jun2005 4946.9
212782 31jul2005 5042.1
212782 31aug2005 5145
212782 30sep2005 5302.5
212782 31oct2005 5253.5
212782 30nov2005 5642.7
212782 31dec2005 6230
etc...
Well, I would like to transpose my dataset so it shows each month as a
variable and the observations are mcap_sums. My tries with reshape
failed miserably. (xpose wont work because I still want to keep
mcap_sum as an observation).. Does anybody has a suggestion to solve
this quickly?
gvkey 31dec2005 30nov2005 31oct2005
212782 6230 5642.7 5253.5 ...........
Best,
Kaspar
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


<>
Why do we need another "id" variable? In Eric`s code, it is created via
*************
g id = _n
*************
Is "gvkey" not supposed to be the "id" variable?
HTH
Martin
Ursprüngliche Nachricht
Von: [hidden email]
[mailto: [hidden email]] Im Auftrag von Eric Booth
Gesendet: Montag, 2. August 2010 14:51
An: [hidden email]
Betreff: st: RE: Transposing datasets
<>
You can't name a new variable with a number as the first character (e.g.,
"31jun1980"). So, tostring your datadate var first:
*********!
clear
inp gvkey str20(datadate) mcap_sum
212782 30jun2005 4946.9
212782 31jul2005 5042.1
212782 31aug2005 5145
212782 30sep2005 5302.5
212782 31oct2005 5253.5
212782 30nov2005 5642.7
212782 31dec2005 6230
end
**set up data**
g datadate2 = date(datadate, "DMY")
format datadate2 %td
drop datadate
rename datadate2 datadate
//1. make date a string var//
tostring datadate, force replace u
//2. reshape wide using datadate//
g id = _n
reshape wide mcap_sum, i(id) j(datadate) string
ds mcap_*
//3. move all obs for a gvkey to one line//
foreach v in `r(varlist)' {
bys gvkey: egen `v'2 = max(`v')
drop `v'
rename `v'2 `v'
}
by gvkey: g o = 1 ==_n
keep if o==1
drop o
*********!
~ Eric
____________
______________________________
________________________________________
From: [hidden email]
[ [hidden email]] on behalf of Kaspar Dardas
[ [hidden email]]
Sent: Monday, August 02, 2010 6:37 AM
To: [hidden email]
Subject: st: Transposing datasets
Hi guys,
I have a dataset with about 32000 observations, which is in long
format (see structure below). gvkey is the identifier for a firm
(about 600 different firms), datadate is the monthend value between
2002 and 2010, which of course repeats in the dataset (again, long
format) and mcap_sum is my observation, which is different for each
month and gvkey.
gvkey datadate mcap_sum
212782 30jun2005 4946.9
212782 31jul2005 5042.1
212782 31aug2005 5145
212782 30sep2005 5302.5
212782 31oct2005 5253.5
212782 30nov2005 5642.7
212782 31dec2005 6230
etc...
Well, I would like to transpose my dataset so it shows each month as a
variable and the observations are mcap_sums. My tries with reshape
failed miserably. (xpose wont work because I still want to keep
mcap_sum as an observation).. Does anybody has a suggestion to solve
this quickly?
gvkey 31dec2005 30nov2005 31oct2005
212782 6230 5642.7 5253.5 ...........
Best,
Kaspar
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


<>
I misread in the OP what gvkey was...you're right, there's no need for "id". My post was delayedI had sent my email before yours came through so I hadn't intended for mine to be some kind of comment/alternate to your ppost, as yours was clearly better. ( Also, I learned something about using subinstr() in the rename command from your post, thanks )
~ Eric
________________________________________________________________________________
________________________________________________________________________________
From: [hidden email] [ [hidden email]] on behalf of Martin Weiss [ [hidden email]]
Sent: Monday, August 02, 2010 8:02 AM
To: [hidden email]
Subject: st: AW: RE: Transposing datasets
<>
Why do we need another "id" variable? In Eric`s code, it is created via
*************
g id = _n
*************
Is "gvkey" not supposed to be the "id" variable?
HTH
Martin
Ursprüngliche Nachricht
Von: [hidden email]
[mailto: [hidden email]] Im Auftrag von Eric Booth
Gesendet: Montag, 2. August 2010 14:51
An: [hidden email]
Betreff: st: RE: Transposing datasets
<>
You can't name a new variable with a number as the first character (e.g.,
"31jun1980"). So, tostring your datadate var first:
*********!
clear
inp gvkey str20(datadate) mcap_sum
212782 30jun2005 4946.9
212782 31jul2005 5042.1
212782 31aug2005 5145
212782 30sep2005 5302.5
212782 31oct2005 5253.5
212782 30nov2005 5642.7
212782 31dec2005 6230
end
**set up data**
g datadate2 = date(datadate, "DMY")
format datadate2 %td
drop datadate
rename datadate2 datadate
//1. make date a string var//
tostring datadate, force replace u
//2. reshape wide using datadate//
g id = _n
reshape wide mcap_sum, i(id) j(datadate) string
ds mcap_*
//3. move all obs for a gvkey to one line//
foreach v in `r(varlist)' {
bys gvkey: egen `v'2 = max(`v')
drop `v'
rename `v'2 `v'
}
by gvkey: g o = 1 ==_n
keep if o==1
drop o
*********!
~ Eric
____________
______________________________
________________________________________
From: [hidden email]
[ [hidden email]] on behalf of Kaspar Dardas
[ [hidden email]]
Sent: Monday, August 02, 2010 6:37 AM
To: [hidden email]
Subject: st: Transposing datasets
Hi guys,
I have a dataset with about 32000 observations, which is in long
format (see structure below). gvkey is the identifier for a firm
(about 600 different firms), datadate is the monthend value between
2002 and 2010, which of course repeats in the dataset (again, long
format) and mcap_sum is my observation, which is different for each
month and gvkey.
gvkey datadate mcap_sum
212782 30jun2005 4946.9
212782 31jul2005 5042.1
212782 31aug2005 5145
212782 30sep2005 5302.5
212782 31oct2005 5253.5
212782 30nov2005 5642.7
212782 31dec2005 6230
etc...
Well, I would like to transpose my dataset so it shows each month as a
variable and the observations are mcap_sums. My tries with reshape
failed miserably. (xpose wont work because I still want to keep
mcap_sum as an observation).. Does anybody has a suggestion to solve
this quickly?
gvkey 31dec2005 30nov2005 31oct2005
212782 6230 5642.7 5253.5 ...........
Best,
Kaspar
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


<>
"( Also, I learned something about using subinstr() in the rename command
from your post, thanks )"
Cheers! I love to work from first principles whenever possible, so my use of
subinstr() was not intended to detract from the appeal of NJC`s findit
renvars...
HTH
Martin
Ursprüngliche Nachricht
Von: [hidden email]
[mailto: [hidden email]] Im Auftrag von Eric Booth
Gesendet: Montag, 2. August 2010 15:11
An: [hidden email]
Betreff: st: RE: AW: RE: Transposing datasets
<>
I misread in the OP what gvkey was...you're right, there's no need for "id".
My post was delayedI had sent my email before yours came through so I
hadn't intended for mine to be some kind of comment/alternate to your post,
as yours was clearly better. ( Also, I learned something about using
subinstr() in the rename command from your post, thanks )
~ Eric
____________________________________________________________________________
____
____________________________________________________________________________
____
From: [hidden email]
[ [hidden email]] on behalf of Martin Weiss
[ [hidden email]]
Sent: Monday, August 02, 2010 8:02 AM
To: [hidden email]
Subject: st: AW: RE: Transposing datasets
<>
Why do we need another "id" variable? In Eric`s code, it is created via
*************
g id = _n
*************
Is "gvkey" not supposed to be the "id" variable?
HTH
Martin
Ursprüngliche Nachricht
Von: [hidden email]
[mailto: [hidden email]] Im Auftrag von Eric Booth
Gesendet: Montag, 2. August 2010 14:51
An: [hidden email]
Betreff: st: RE: Transposing datasets
<>
You can't name a new variable with a number as the first character (e.g.,
"31jun1980"). So, tostring your datadate var first:
*********!
clear
inp gvkey str20(datadate) mcap_sum
212782 30jun2005 4946.9
212782 31jul2005 5042.1
212782 31aug2005 5145
212782 30sep2005 5302.5
212782 31oct2005 5253.5
212782 30nov2005 5642.7
212782 31dec2005 6230
end
**set up data**
g datadate2 = date(datadate, "DMY")
format datadate2 %td
drop datadate
rename datadate2 datadate
//1. make date a string var//
tostring datadate, force replace u
//2. reshape wide using datadate//
g id = _n
reshape wide mcap_sum, i(id) j(datadate) string
ds mcap_*
//3. move all obs for a gvkey to one line//
foreach v in `r(varlist)' {
bys gvkey: egen `v'2 = max(`v')
drop `v'
rename `v'2 `v'
}
by gvkey: g o = 1 ==_n
keep if o==1
drop o
*********!
~ Eric
____________
______________________________
________________________________________
From: [hidden email]
[ [hidden email]] on behalf of Kaspar Dardas
[ [hidden email]]
Sent: Monday, August 02, 2010 6:37 AM
To: [hidden email]
Subject: st: Transposing datasets
Hi guys,
I have a dataset with about 32000 observations, which is in long
format (see structure below). gvkey is the identifier for a firm
(about 600 different firms), datadate is the monthend value between
2002 and 2010, which of course repeats in the dataset (again, long
format) and mcap_sum is my observation, which is different for each
month and gvkey.
gvkey datadate mcap_sum
212782 30jun2005 4946.9
212782 31jul2005 5042.1
212782 31aug2005 5145
212782 30sep2005 5302.5
212782 31oct2005 5253.5
212782 30nov2005 5642.7
212782 31dec2005 6230
etc...
Well, I would like to transpose my dataset so it shows each month as a
variable and the observations are mcap_sums. My tries with reshape
failed miserably. (xpose wont work because I still want to keep
mcap_sum as an observation).. Does anybody has a suggestion to solve
this quickly?
gvkey 31dec2005 30nov2005 31oct2005
212782 6230 5642.7 5253.5 ...........
Best,
Kaspar
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


<>
Nick himself advocated "first principles" in
http://www.stata.com/statalist/archive/201005/msg01165.html, btw...
HTH
Martin
Ursprüngliche Nachricht
Von: [hidden email]
[mailto: [hidden email]] Im Auftrag von Martin Weiss
Gesendet: Montag, 2. August 2010 15:20
An: [hidden email]
Betreff: st: AW: RE: AW: RE: Transposing datasets
<>
"( Also, I learned something about using subinstr() in the rename command
from your post, thanks )"
Cheers! I love to work from first principles whenever possible, so my use of
subinstr() was not intended to detract from the appeal of NJC`s findit
renvars...
HTH
Martin
Ursprüngliche Nachricht
Von: [hidden email]
[mailto: [hidden email]] Im Auftrag von Eric Booth
Gesendet: Montag, 2. August 2010 15:11
An: [hidden email]
Betreff: st: RE: AW: RE: Transposing datasets
<>
I misread in the OP what gvkey was...you're right, there's no need for "id".
My post was delayedI had sent my email before yours came through so I
hadn't intended for mine to be some kind of comment/alternate to your post,
as yours was clearly better. ( Also, I learned something about using
subinstr() in the rename command from your post, thanks )
~ Eric
____________________________________________________________________________
____
____________________________________________________________________________
____
From: [hidden email]
[ [hidden email]] on behalf of Martin Weiss
[ [hidden email]]
Sent: Monday, August 02, 2010 8:02 AM
To: [hidden email]
Subject: st: AW: RE: Transposing datasets
<>
Why do we need another "id" variable? In Eric`s code, it is created via
*************
g id = _n
*************
Is "gvkey" not supposed to be the "id" variable?
HTH
Martin
Ursprüngliche Nachricht
Von: [hidden email]
[mailto: [hidden email]] Im Auftrag von Eric Booth
Gesendet: Montag, 2. August 2010 14:51
An: [hidden email]
Betreff: st: RE: Transposing datasets
<>
You can't name a new variable with a number as the first character (e.g.,
"31jun1980"). So, tostring your datadate var first:
*********!
clear
inp gvkey str20(datadate) mcap_sum
212782 30jun2005 4946.9
212782 31jul2005 5042.1
212782 31aug2005 5145
212782 30sep2005 5302.5
212782 31oct2005 5253.5
212782 30nov2005 5642.7
212782 31dec2005 6230
end
**set up data**
g datadate2 = date(datadate, "DMY")
format datadate2 %td
drop datadate
rename datadate2 datadate
//1. make date a string var//
tostring datadate, force replace u
//2. reshape wide using datadate//
g id = _n
reshape wide mcap_sum, i(id) j(datadate) string
ds mcap_*
//3. move all obs for a gvkey to one line//
foreach v in `r(varlist)' {
bys gvkey: egen `v'2 = max(`v')
drop `v'
rename `v'2 `v'
}
by gvkey: g o = 1 ==_n
keep if o==1
drop o
*********!
~ Eric
____________
______________________________
________________________________________
From: [hidden email]
[ [hidden email]] on behalf of Kaspar Dardas
[ [hidden email]]
Sent: Monday, August 02, 2010 6:37 AM
To: [hidden email]
Subject: st: Transposing datasets
Hi guys,
I have a dataset with about 32000 observations, which is in long
format (see structure below). gvkey is the identifier for a firm
(about 600 different firms), datadate is the monthend value between
2002 and 2010, which of course repeats in the dataset (again, long
format) and mcap_sum is my observation, which is different for each
month and gvkey.
gvkey datadate mcap_sum
212782 30jun2005 4946.9
212782 31jul2005 5042.1
212782 31aug2005 5145
212782 30sep2005 5302.5
212782 31oct2005 5253.5
212782 30nov2005 5642.7
212782 31dec2005 6230
etc...
Well, I would like to transpose my dataset so it shows each month as a
variable and the observations are mcap_sums. My tries with reshape
failed miserably. (xpose wont work because I still want to keep
mcap_sum as an observation).. Does anybody has a suggestion to solve
this quickly?
gvkey 31dec2005 30nov2005 31oct2005
212782 6230 5642.7 5253.5 ...........
Best,
Kaspar
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


No arguments here.
Sometimes we all just want to get results quickly and accurately, and
care not a jot about logic. Sometimes understanding exactly what is
going on is the best prescription because it is all too likely that your
next such problem will not be identical and you will need to know how to
change the solution.
Nick
[hidden email]
Martin Weiss
Nick himself advocated "first principles" in
http://www.stata.com/statalist/archive/201005/msg01165.html, btw...
Martin Weiss
"( Also, I learned something about using subinstr() in the rename
command
from your post, thanks )"
Cheers! I love to work from first principles whenever possible, so my
use of
subinstr() was not intended to detract from the appeal of NJC`s
findit
renvars...
Eric Booth
( Also, I learned something about using
subinstr() in the rename command from your post, thanks )
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


Hey,
thanks guys. Indeed, I have stored datadate in %td format, which
caused my problems with the reshape command.
Now all works fine.
Best,
Kaspar
2010/8/2 Nick Cox < [hidden email]>:
> No arguments here.
>
> Sometimes we all just want to get results quickly and accurately, and
> care not a jot about logic. Sometimes understanding exactly what is
> going on is the best prescription because it is all too likely that your
> next such problem will not be identical and you will need to know how to
> change the solution.
>
> Nick
> [hidden email]
>
> Martin Weiss
>
> Nick himself advocated "first principles" in
> http://www.stata.com/statalist/archive/201005/msg01165.html, btw...
>
> Martin Weiss
>
> "( Also, I learned something about using subinstr() in the rename
> command
> from your post, thanks )"
>
> Cheers! I love to work from first principles whenever possible, so my
> use of
> subinstr() was not intended to detract from the appeal of NJC`s
> findit
> renvars...
>
> Eric Booth
>
> ( Also, I learned something about using
> subinstr() in the rename command from your post, thanks )
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search> * http://www.stata.com/support/statalist/faq> * http://www.ats.ucla.edu/stat/stata/>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


<>
" Indeed, I have stored datadate in %td format, which
caused my problems with the reshape command."
The display format should not matter at all, I think, for reshape purposes. This code runs w/o problems:
*************
clear*
inp gvkey str10 datadate mcap_sum
212782 30jun2005 4946.9
212782 31jul2005 5042.1
212782 31aug2005 5145
212782 30sep2005 5302.5
212782 31oct2005 5253.5
212782 30nov2005 5642.7
212782 31dec2005 6230
212783 30jun2005 496.9
212783 31jul2005 502.1
212783 31aug2005 515
212783 30sep2005 530.5
212783 31oct2005 523.5
212783 30nov2005 562.7
212783 31dec2005 630
end
li, noo sepby(gvkey)
gen datadate2=date(datadate, "DMY")
form datadate2 %td
drop datadate
reshape wide mcap, i(gvkey) j(datadate2)
foreach var of varlist mcap_sum*{
ren `var' `=subinstr("`var'", "mcap_sum", "date_", 1)'
}
l
*************
HTH
Martin
Ursprüngliche Nachricht
Von: [hidden email] [mailto: [hidden email]] Im Auftrag von Kaspar Dardas
Gesendet: Montag, 2. August 2010 15:44
An: [hidden email]
Betreff: Re: st: RE: AW: AW: RE: AW: RE: Transposing datasets
Hey,
thanks guys. Indeed, I have stored datadate in %td format, which
caused my problems with the reshape command.
Now all works fine.
Best,
Kaspar
2010/8/2 Nick Cox < [hidden email]>:
> No arguments here.
>
> Sometimes we all just want to get results quickly and accurately, and
> care not a jot about logic. Sometimes understanding exactly what is
> going on is the best prescription because it is all too likely that your
> next such problem will not be identical and you will need to know how to
> change the solution.
>
> Nick
> [hidden email]
>
> Martin Weiss
>
> Nick himself advocated "first principles" in
> http://www.stata.com/statalist/archive/201005/msg01165.html, btw...
>
> Martin Weiss
>
> "( Also, I learned something about using subinstr() in the rename
> command
> from your post, thanks )"
>
> Cheers! I love to work from first principles whenever possible, so my
> use of
> subinstr() was not intended to detract from the appeal of NJC`s
> findit
> renvars...
>
> Eric Booth
>
> ( Also, I learned something about using
> subinstr() in the rename command from your post, thanks )
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search> * http://www.stata.com/support/statalist/faq> * http://www.ats.ucla.edu/stat/stata/>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


all right, I made reference to Eric's approach...
2010/8/2 Martin Weiss < [hidden email]>:
>
> <>
>
>
> " Indeed, I have stored datadate in %td format, which
> caused my problems with the reshape command."
>
>
>
>
> The display format should not matter at all, I think, for reshape purposes. This code runs w/o problems:
>
>
> *************
> clear*
>
> inp gvkey str10 datadate mcap_sum
> 212782 30jun2005 4946.9
> 212782 31jul2005 5042.1
> 212782 31aug2005 5145
> 212782 30sep2005 5302.5
> 212782 31oct2005 5253.5
> 212782 30nov2005 5642.7
> 212782 31dec2005 6230
> 212783 30jun2005 496.9
> 212783 31jul2005 502.1
> 212783 31aug2005 515
> 212783 30sep2005 530.5
> 212783 31oct2005 523.5
> 212783 30nov2005 562.7
> 212783 31dec2005 630
> end
>
> li, noo sepby(gvkey)
>
> gen datadate2=date(datadate, "DMY")
> form datadate2 %td
>
> drop datadate
>
> reshape wide mcap, i(gvkey) j(datadate2)
>
> foreach var of varlist mcap_sum*{
> ren `var' `=subinstr("`var'", "mcap_sum", "date_", 1)'
> }
>
> l
> *************
>
>
>
> HTH
> Martin
>
>
> Ursprüngliche Nachricht
> Von: [hidden email] [mailto: [hidden email]] Im Auftrag von Kaspar Dardas
> Gesendet: Montag, 2. August 2010 15:44
> An: [hidden email]
> Betreff: Re: st: RE: AW: AW: RE: AW: RE: Transposing datasets
>
> Hey,
>
> thanks guys. Indeed, I have stored datadate in %td format, which
> caused my problems with the reshape command.
> Now all works fine.
>
> Best,
>
> Kaspar
>
>
>
>
>
> 2010/8/2 Nick Cox < [hidden email]>:
>> No arguments here.
>>
>> Sometimes we all just want to get results quickly and accurately, and
>> care not a jot about logic. Sometimes understanding exactly what is
>> going on is the best prescription because it is all too likely that your
>> next such problem will not be identical and you will need to know how to
>> change the solution.
>>
>> Nick
>> [hidden email]
>>
>> Martin Weiss
>>
>> Nick himself advocated "first principles" in
>> http://www.stata.com/statalist/archive/201005/msg01165.html, btw...
>>
>> Martin Weiss
>>
>> "( Also, I learned something about using subinstr() in the rename
>> command
>> from your post, thanks )"
>>
>> Cheers! I love to work from first principles whenever possible, so my
>> use of
>> subinstr() was not intended to detract from the appeal of NJC`s
>> findit
>> renvars...
>>
>> Eric Booth
>>
>> ( Also, I learned something about using
>> subinstr() in the rename command from your post, thanks )
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search>> * http://www.stata.com/support/statalist/faq>> * http://www.ats.ucla.edu/stat/stata/>>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search> * http://www.stata.com/support/statalist/faq> * http://www.ats.ucla.edu/stat/stata/>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search> * http://www.stata.com/support/statalist/faq> * http://www.ats.ucla.edu/stat/stata/>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/


Thanks to Neil for the mention of renvars by Jeroen Weesie and myself.
However, renvars is to be downloaded from the Stata Journal site, not SSC.
If people are unclear about where my stuff may be found, a package njc_stuff, which _is_ to be found at SSC, is dedicated to that purpose.
Nick
[hidden email]
Neil Shephard replied to Kaspar Dardas
> I have a dataset with about 32000 observations, which is in long
> format (see structure below). gvkey is the identifier for a firm
> (about 600 different firms), datadate is the monthend value between
> 2002 and 2010, which of course repeats in the dataset (again, long
> format) and mcap_sum is my observation, which is different for each
> month and gvkey.
>
> gvkey datadate mcap_sum
> 212782 30jun2005 4946.9
> 212782 31jul2005 5042.1
> 212782 31aug2005 5145
> 212782 30sep2005 5302.5
> 212782 31oct2005 5253.5
> 212782 30nov2005 5642.7
> 212782 31dec2005 6230
> etc...
>
> Well, I would like to transpose my dataset so it shows each month as a
> variable and the observations are mcap_sums. My tries with reshape
> failed miserably.
It's useful and informative to post the code that "failed miserably"
and the output that is associated with it; that way people can see
what you've been trying and point out where you've gone wrong (all
recommended as good practice in the Statalist FAQ, linked from the
bottom of every message to the list).
In the absence of this I'd suggest
reshape wide mcap_sum, i(gvkey) j(datadate)
...but this may be complicated by the format of 'datadate' if it is
formatted as a date variable (as underneath the formatting which
presents it as you've listed it will be stored as a numeric of sorts).
> (xpose wont work because I still want to keep
> mcap_sum as an observation).. Does anybody has a suggestion to solve
> this quickly?
>
> gvkey 31dec2005 30nov2005 31oct2005
> 212782 6230 5642.7 5253.5 ...........
You'll never achieve these names as Stata variables are not allowed to
start with numbers. renvars (install with ssc install renvars)
may be of use in renaming variables here though.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search* http://www.stata.com/support/statalist/faq* http://www.ats.ucla.edu/stat/stata/

