st: tips on reading large matrix from ASCII file?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

st: tips on reading large matrix from ASCII file?

Jeph Herrin

I've used HLM6 to run a large number of mixed effects models,
each model producing an ASCII file which contains the fixed
effects and variance-covariance matrix. There are 60+ variables,
but for whatever reason HLM6 only writes 60 values per line,
so one ASCII file (with 65 terms) looks like this


  F1   F2   .. ....   ...  F60
  F61  F62  F63  F64  F65          // 65 coeffs on two rows
  V11  V12   .. ....   ...  V160
  V161 V162 V163 V164 V165         // then 65x65 VC entries on
  V21  V22   .. ....   ...  V260   //  2 x 65 lines
  V261 V262 V263 V264 V265
  .
  .
  .
  V651  V652   .. ....   ... V6560  // last row of VC matrix on
  V6561 V6562 V6563 V6564 V6565     //   two lines

(where the ASCII file doesn't have the comments).

Since this is a fairly rigid format that depends only on
N, the number of covariates, I thought it would be a small
matter to -infile- this with a -dct- file and store it as
a matrix. However, -infile- requires me to write out every
single variable name, and to modify the number of variables
in the -dct- file according to the number of covariates.

In the past, I have used PERL to parse these files, but
I'm doing this on a new box and figured instead of reinstalling
PERL I'd try to sort it out in Stata. Is there an easier way
convert this file to a matrix (actually a vector for the first
two lines and a matrix for the remainder)?

Thanks for any suggestions.

Jeph
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|

Re: st: tips on reading large matrix from ASCII file?

Jeph Herrin

Solved, but there may be better ways:

I used -file- to read the ASCII file a line at a time
and write out a -do- file which contained the necessary
lines to -matrix input- the values. This works beautifully
in that the process is entirely automated. I did have to
create row vectors and combine them with "\", as the whole
matrix was too big to input at once. Here's the top of the
automatically generated -do- file; I then just run the do
file to create the vector and VC matrices:


#delimit ;
matrix input b = (
-1.9488158      -0.0874626       0.1785559       0.0304908
-0.0104287      -0.0703224      -0.1402648      -0.1590217
-0.1926564      -0.3406200      -0.4178385       0.0257852
-0.0065606       0.0570689      -0.0954522       0.1280117
0.0992062       0.0927456       0.0836843       0.0516530
0.0455983       0.0205244      -0.0223471      -0.0603116
-0.0759259      -0.0762618      -0.0953297      -0.1421571
-0.1698843       0.3499212       0.2285282       0.2955863
0.0175301       0.7239078       0.2099990       0.2993378
0.3490949       1.0163712       0.2478369       0.3167427
-0.3050866      -0.2854428       1.6223277      -0.0351432
0.1986832       0.2789393       0.2540214      -0.0226724
-0.2104767      -0.0402748       0.0700727       0.0355059
0.0453931      -0.2231259       0.0576488       0.0559482
0.1819322       0.4569752       0.4768460       1.7106584
2.3341741       2.0369928      -0.0271400       0.0421754      -0.0549200
) ;

And so on.

cheers,
Jeph



Jeph Herrin wrote:

> I've used HLM6 to run a large number of mixed effects models,
> each model producing an ASCII file which contains the fixed
> effects and variance-covariance matrix. There are 60+ variables,
> but for whatever reason HLM6 only writes 60 values per line,
> so one ASCII file (with 65 terms) looks like this
>
>
>  F1   F2   .. ....   ...  F60
>  F61  F62  F63  F64  F65          // 65 coeffs on two rows
>  V11  V12   .. ....   ...  V160
>  V161 V162 V163 V164 V165         // then 65x65 VC entries on
>  V21  V22   .. ....   ...  V260   //  2 x 65 lines
>  V261 V262 V263 V264 V265
>  .
>  .
>  .
>  V651  V652   .. ....   ... V6560  // last row of VC matrix on
>  V6561 V6562 V6563 V6564 V6565     //   two lines
>
> (where the ASCII file doesn't have the comments).
>
> Since this is a fairly rigid format that depends only on
> N, the number of covariates, I thought it would be a small
> matter to -infile- this with a -dct- file and store it as
> a matrix. However, -infile- requires me to write out every
> single variable name, and to modify the number of variables
> in the -dct- file according to the number of covariates.
>
> In the past, I have used PERL to parse these files, but
> I'm doing this on a new box and figured instead of reinstalling
> PERL I'd try to sort it out in Stata. Is there an easier way
> convert this file to a matrix (actually a vector for the first
> two lines and a matrix for the remainder)?
>
> Thanks for any suggestions.
>
> Jeph
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|

Re: st: tips on reading large matrix from ASCII file?

Stas Kolenikov
Uhm... just an idea: re-run the analysis using -xtmixed- and -post- to
work out any summaries of any particular analysis that you need as you
go?.. Or is there some functionality not present in -xtmixed- that
only HLM offers?

As for your code, check to see what exactly

mat A = (3 -5)
mat li A

produces. Not quite what you expected; you'd need to put commas
between the matrix values.

On 12/2/08, Jeph Herrin <[hidden email]> wrote:

>
>  Solved, but there may be better ways:
>
>  I used -file- to read the ASCII file a line at a time
>  and write out a -do- file which contained the necessary
>  lines to -matrix input- the values. This works beautifully
>  in that the process is entirely automated. I did have to
>  create row vectors and combine them with "\", as the whole
>  matrix was too big to input at once. Here's the top of the
>  automatically generated -do- file; I then just run the do
>  file to create the vector and VC matrices:
>
>
>  #delimit ;
>  matrix input b = (
>  -1.9488158      -0.0874626       0.1785559       0.0304908 -0.0104287
> -0.0703224      -0.1402648      -0.1590217 -0.1926564      -0.3406200
> -0.4178385       0.0257852 -0.0065606       0.0570689      -0.0954522
> 0.1280117 0.0992062       0.0927456       0.0836843       0.0516530
> 0.0455983       0.0205244      -0.0223471      -0.0603116 -0.0759259
> -0.0762618      -0.0953297      -0.1421571 -0.1698843       0.3499212
> 0.2285282       0.2955863 0.0175301       0.7239078       0.2099990
> 0.2993378 0.3490949       1.0163712       0.2478369       0.3167427
> -0.3050866      -0.2854428       1.6223277      -0.0351432 0.1986832
> 0.2789393       0.2540214      -0.0226724 -0.2104767      -0.0402748
> 0.0700727       0.0355059 0.0453931      -0.2231259       0.0576488
> 0.0559482 0.1819322       0.4569752       0.4768460       1.7106584
>  2.3341741       2.0369928      -0.0271400       0.0421754      -0.0549200
>  ) ;
>
>  And so on.
>
>  cheers,
>  Jeph
>
>
>
>
>  Jeph Herrin wrote:
>
> > I've used HLM6 to run a large number of mixed effects models,
> > each model producing an ASCII file which contains the fixed
> > effects and variance-covariance matrix. There are 60+ variables,
> > but for whatever reason HLM6 only writes 60 values per line,
> > so one ASCII file (with 65 terms) looks like this
> >
> >
> >  F1   F2   .. ....   ...  F60
> >  F61  F62  F63  F64  F65          // 65 coeffs on two rows
> >  V11  V12   .. ....   ...  V160
> >  V161 V162 V163 V164 V165         // then 65x65 VC entries on
> >  V21  V22   .. ....   ...  V260   //  2 x 65 lines
> >  V261 V262 V263 V264 V265
> >  .
> >  .
> >  .
> >  V651  V652   .. ....   ... V6560  // last row of VC matrix on
> >  V6561 V6562 V6563 V6564 V6565     //   two lines
> >
> > (where the ASCII file doesn't have the comments).
> >
> > Since this is a fairly rigid format that depends only on
> > N, the number of covariates, I thought it would be a small
> > matter to -infile- this with a -dct- file and store it as
> > a matrix. However, -infile- requires me to write out every
> > single variable name, and to modify the number of variables
> > in the -dct- file according to the number of covariates.
> >
> > In the past, I have used PERL to parse these files, but
> > I'm doing this on a new box and figured instead of reinstalling
> > PERL I'd try to sort it out in Stata. Is there an easier way
> > convert this file to a matrix (actually a vector for the first
> > two lines and a matrix for the remainder)?
> >
> > Thanks for any suggestions.
> >
> > Jeph
> > *
> > *   For searches and help try:
> > *   http://www.stata.com/help.cgi?search
> > *   http://www.stata.com/support/statalist/faq
> > *   http://www.ats.ucla.edu/stat/stata/
> >
> >
>  *
>  *   For searches and help try:
>  *   http://www.stata.com/help.cgi?search
>  *   http://www.stata.com/support/statalist/faq
>  *   http://www.ats.ucla.edu/stat/stata/
>


--
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|

RE: st: tips on reading large matrix from ASCII file?

Nick Cox
That's precisely why Jeph used -matrix input- not -matrix-, I presume.

Nick
[hidden email]

Stas Kolenikov

As for your code, check to see what exactly

mat A = (3 -5)
mat li A

produces. Not quite what you expected; you'd need to put commas
between the matrix values.

On 12/2/08, Jeph Herrin <[hidden email]> wrote:

>  I used -file- to read the ASCII file a line at a time
>  and write out a -do- file which contained the necessary
>  lines to -matrix input- the values.

< snip >

>  #delimit ;
>  matrix input b = (
>  -1.9488158      -0.0874626       0.1785559       0.0304908 -0.0104287
> -0.0703224      -0.1402648      -0.1590217 -0.1926564      -0.3406200
> -0.4178385       0.0257852 -0.0065606       0.0570689      -0.0954522
> 0.1280117 0.0992062       0.0927456       0.0836843       0.0516530
> 0.0455983       0.0205244      -0.0223471      -0.0603116 -0.0759259
> -0.0762618      -0.0953297      -0.1421571 -0.1698843       0.3499212
> 0.2285282       0.2955863 0.0175301       0.7239078       0.2099990
> 0.2993378 0.3490949       1.0163712       0.2478369       0.3167427
> -0.3050866      -0.2854428       1.6223277      -0.0351432 0.1986832
> 0.2789393       0.2540214      -0.0226724 -0.2104767      -0.0402748
> 0.0700727       0.0355059 0.0453931      -0.2231259       0.0576488
> 0.0559482 0.1819322       0.4569752       0.4768460       1.7106584
>  2.3341741       2.0369928      -0.0271400       0.0421754
-0.0549200
>  ) ;

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|

Re: st: tips on reading large matrix from ASCII file?

Jeph Herrin-2
In reply to this post by Jeph Herrin
Two things:

1. I ran the empty model (no predictors) using -xtmelogit-
    and it was still going after 24 hours. I have dozens of models
    to run; HLM6 did them all in less than 20 minutes total.
    Not sure if that counts as functionality.

2. As for my code, check to see what "exactly"

   mat input A = (3 -5)
   mat li A

produces. Not quite what you expected; the -input- qualifier
obviates the commas.

cheers,
Jeph



Stas Kolenikov wrote:

> Uhm... just an idea: re-run the analysis using -xtmixed- and -post- to
> work out any summaries of any particular analysis that you need as you
> go?.. Or is there some functionality not present in -xtmixed- that
> only HLM offers?
>
> As for your code, check to see what exactly
>
> mat A = (3 -5)
> mat li A
>
> produces. Not quite what you expected; you'd need to put commas
> between the matrix values.
>
> On 12/2/08, Jeph Herrin <[hidden email]> wrote:
>>  Solved, but there may be better ways:
>>
>>  I used -file- to read the ASCII file a line at a time
>>  and write out a -do- file which contained the necessary
>>  lines to -matrix input- the values. This works beautifully
>>  in that the process is entirely automated. I did have to
>>  create row vectors and combine them with "\", as the whole
>>  matrix was too big to input at once. Here's the top of the
>>  automatically generated -do- file; I then just run the do
>>  file to create the vector and VC matrices:
>>
>>
>>  #delimit ;
>>  matrix input b = (
>>  -1.9488158      -0.0874626       0.1785559       0.0304908 -0.0104287
>> -0.0703224      -0.1402648      -0.1590217 -0.1926564      -0.3406200
>> -0.4178385       0.0257852 -0.0065606       0.0570689      -0.0954522
>> 0.1280117 0.0992062       0.0927456       0.0836843       0.0516530
>> 0.0455983       0.0205244      -0.0223471      -0.0603116 -0.0759259
>> -0.0762618      -0.0953297      -0.1421571 -0.1698843       0.3499212
>> 0.2285282       0.2955863 0.0175301       0.7239078       0.2099990
>> 0.2993378 0.3490949       1.0163712       0.2478369       0.3167427
>> -0.3050866      -0.2854428       1.6223277      -0.0351432 0.1986832
>> 0.2789393       0.2540214      -0.0226724 -0.2104767      -0.0402748
>> 0.0700727       0.0355059 0.0453931      -0.2231259       0.0576488
>> 0.0559482 0.1819322       0.4569752       0.4768460       1.7106584
>>  2.3341741       2.0369928      -0.0271400       0.0421754      -0.0549200
>>  ) ;
>>
>>  And so on.
>>
>>  cheers,
>>  Jeph
>>
>>
>>
>>
>>  Jeph Herrin wrote:
>>
>>> I've used HLM6 to run a large number of mixed effects models,
>>> each model producing an ASCII file which contains the fixed
>>> effects and variance-covariance matrix. There are 60+ variables,
>>> but for whatever reason HLM6 only writes 60 values per line,
>>> so one ASCII file (with 65 terms) looks like this
>>>
>>>
>>>  F1   F2   .. ....   ...  F60
>>>  F61  F62  F63  F64  F65          // 65 coeffs on two rows
>>>  V11  V12   .. ....   ...  V160
>>>  V161 V162 V163 V164 V165         // then 65x65 VC entries on
>>>  V21  V22   .. ....   ...  V260   //  2 x 65 lines
>>>  V261 V262 V263 V264 V265
>>>  .
>>>  .
>>>  .
>>>  V651  V652   .. ....   ... V6560  // last row of VC matrix on
>>>  V6561 V6562 V6563 V6564 V6565     //   two lines
>>>
>>> (where the ASCII file doesn't have the comments).
>>>
>>> Since this is a fairly rigid format that depends only on
>>> N, the number of covariates, I thought it would be a small
>>> matter to -infile- this with a -dct- file and store it as
>>> a matrix. However, -infile- requires me to write out every
>>> single variable name, and to modify the number of variables
>>> in the -dct- file according to the number of covariates.
>>>
>>> In the past, I have used PERL to parse these files, but
>>> I'm doing this on a new box and figured instead of reinstalling
>>> PERL I'd try to sort it out in Stata. Is there an easier way
>>> convert this file to a matrix (actually a vector for the first
>>> two lines and a matrix for the remainder)?
>>>
>>> Thanks for any suggestions.
>>>
>>> Jeph
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>>
>>  *
>>  *   For searches and help try:
>>  *   http://www.stata.com/help.cgi?search
>>  *   http://www.stata.com/support/statalist/faq
>>  *   http://www.ats.ucla.edu/stat/stata/
>>
>
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|

Re: st: tips on reading large matrix from ASCII file?

Jeph Herrin
In reply to this post by Stas Kolenikov
Two things:

1. I ran the empty model (no predictors) using -xtmelogit-
    and it was still going after 24 hours. I have dozens of models
    to run; HLM6 did them *all* in less than 20 minutes.
    Not sure if that counts as functionality.

2. As for my code, check to see what "exactly"

   mat input A = (3 -5)
   mat li A

produces. Not quite what you expected; the -input- qualifier
obviates the commas.

cheers,
Jeph



Stas Kolenikov wrote:

> Uhm... just an idea: re-run the analysis using -xtmixed- and -post- to
> work out any summaries of any particular analysis that you need as you
> go?.. Or is there some functionality not present in -xtmixed- that
> only HLM offers?
>
> As for your code, check to see what exactly
>
> mat A = (3 -5)
> mat li A
>
> produces. Not quite what you expected; you'd need to put commas
> between the matrix values.
>
> On 12/2/08, Jeph Herrin <[hidden email]> wrote:
>>  Solved, but there may be better ways:
>>
>>  I used -file- to read the ASCII file a line at a time
>>  and write out a -do- file which contained the necessary
>>  lines to -matrix input- the values. This works beautifully
>>  in that the process is entirely automated. I did have to
>>  create row vectors and combine them with "\", as the whole
>>  matrix was too big to input at once. Here's the top of the
>>  automatically generated -do- file; I then just run the do
>>  file to create the vector and VC matrices:
>>
>>
>>  #delimit ;
>>  matrix input b = (
>>  -1.9488158      -0.0874626       0.1785559       0.0304908 -0.0104287
>> -0.0703224      -0.1402648      -0.1590217 -0.1926564      -0.3406200
>> -0.4178385       0.0257852 -0.0065606       0.0570689      -0.0954522
>> 0.1280117 0.0992062       0.0927456       0.0836843       0.0516530
>> 0.0455983       0.0205244      -0.0223471      -0.0603116 -0.0759259
>> -0.0762618      -0.0953297      -0.1421571 -0.1698843       0.3499212
>> 0.2285282       0.2955863 0.0175301       0.7239078       0.2099990
>> 0.2993378 0.3490949       1.0163712       0.2478369       0.3167427
>> -0.3050866      -0.2854428       1.6223277      -0.0351432 0.1986832
>> 0.2789393       0.2540214      -0.0226724 -0.2104767      -0.0402748
>> 0.0700727       0.0355059 0.0453931      -0.2231259       0.0576488
>> 0.0559482 0.1819322       0.4569752       0.4768460       1.7106584
>>  2.3341741       2.0369928      -0.0271400       0.0421754      -0.0549200
>>  ) ;
>>
>>  And so on.
>>
>>  cheers,
>>  Jeph
>>
>>
>>
>>
>>  Jeph Herrin wrote:
>>
>>> I've used HLM6 to run a large number of mixed effects models,
>>> each model producing an ASCII file which contains the fixed
>>> effects and variance-covariance matrix. There are 60+ variables,
>>> but for whatever reason HLM6 only writes 60 values per line,
>>> so one ASCII file (with 65 terms) looks like this
>>>
>>>
>>>  F1   F2   .. ....   ...  F60
>>>  F61  F62  F63  F64  F65          // 65 coeffs on two rows
>>>  V11  V12   .. ....   ...  V160
>>>  V161 V162 V163 V164 V165         // then 65x65 VC entries on
>>>  V21  V22   .. ....   ...  V260   //  2 x 65 lines
>>>  V261 V262 V263 V264 V265
>>>  .
>>>  .
>>>  .
>>>  V651  V652   .. ....   ... V6560  // last row of VC matrix on
>>>  V6561 V6562 V6563 V6564 V6565     //   two lines
>>>
>>> (where the ASCII file doesn't have the comments).
>>>
>>> Since this is a fairly rigid format that depends only on
>>> N, the number of covariates, I thought it would be a small
>>> matter to -infile- this with a -dct- file and store it as
>>> a matrix. However, -infile- requires me to write out every
>>> single variable name, and to modify the number of variables
>>> in the -dct- file according to the number of covariates.
>>>
>>> In the past, I have used PERL to parse these files, but
>>> I'm doing this on a new box and figured instead of reinstalling
>>> PERL I'd try to sort it out in Stata. Is there an easier way
>>> convert this file to a matrix (actually a vector for the first
>>> two lines and a matrix for the remainder)?
>>>
>>> Thanks for any suggestions.
>>>
>>> Jeph
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>>
>>  *
>>  *   For searches and help try:
>>  *   http://www.stata.com/help.cgi?search
>>  *   http://www.stata.com/support/statalist/faq
>>  *   http://www.ats.ucla.edu/stat/stata/
>>
>
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|

Re: st: tips on reading large matrix from ASCII file?

Joseph Coveney
Just curious:  what algorithm are you having HLM use to fit the mixed effects
logistic model?  Could you get the same accuracy in the same run time with a
single integration point with -xtmelogit-?

Joseph Coveney

Jeph Herrin wrote:

> Two things:
>
> 1. I ran the empty model (no predictors) using -xtmelogit-
>    and it was still going after 24 hours. I have dozens of models
>    to run; HLM6 did them *all* in less than 20 minutes.
>    Not sure if that counts as functionality.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Reply | Threaded
Open this post in threaded view
|

Re: st: tips on reading large matrix from ASCII file?

Jeph Herrin


I'm using an undocumented (pre-release 64-bit version)
of HLM6 which I assume uses restricted ML (I'll have to
check with Richard Congdon).

Using a single integration point (ie, Laplace iteration)
for -xtmelogit-, the empty model took 2hours 2 mins to
converge.

I suppose the -xtmelogit- result is more accurate, esp
in estimating the variance terms, but for my purposes
- especially until I have a final set of models - the HLM
results will do.

I think the problem is low rates for my outcome variables,
on the order of 5% or less.

thanks,
Jeph


Joseph Coveney wrote:

> Just curious:  what algorithm are you having HLM use to fit the mixed
> effects
> logistic model?  Could you get the same accuracy in the same run time
> with a
> single integration point with -xtmelogit-?
>
> Joseph Coveney
>
> Jeph Herrin wrote:
>
>> Two things:
>>
>> 1. I ran the empty model (no predictors) using -xtmelogit-
>>    and it was still going after 24 hours. I have dozens of models
>>    to run; HLM6 did them *all* in less than 20 minutes.
>>    Not sure if that counts as functionality.
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/