Organizing non-ranked multiple responses by generating new variable

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Organizing non-ranked multiple responses by generating new variable

JD Wright
This post has NOT been accepted by the mailing list yet.

I am currently organizing a data set. There is one question in particular (see A22b below) that has an "other: specify" option that has generated multiple responses. These responses have been coded by the original data collectors by assigning the response a number between 1-98; each time a respondent mentioned some new “other” it was assigned the next available number.

The main question that begins the question at hand is about whether one is religious or not:

Q. A21: Do you identify yourself with any religion? [bq1_a21_rel_ind]
1. Yes
0. No
8. Don't Know
9. Refused

If the respondent answered "Yes"

Then a second question is asked:

Q. A22a. What religion? [BQ1_A22_REL_TYPE]
1. Buddhist
2. Christian
3. Hindu
4. Jewish
5. Muslim
6. Other (Specify): [this becomes a new variable in the data set: bq1_a22_othrel and is followed by a new variable for the coded response bq1_a22b_rel_cd1]
[There is no option "7"]
8. Don't Know
9. Refused

If option “6. Other (Specify)” was chosen then respondents were asked to fill in the blank. This generated 88 responses—albeit some multiple “other” responses were organized under “Catholic” and “Mormon,” many responses were unique.  

I went through these options and further categorized them according to the original question Q. A22a—since many of the new answers, e.g., Adventist, Baptist, etc. could be easily organized under the “Christian” category. This makes more sense than leaving such options categorized under “other.”

So my question is “How is the most efficient way to organize these?”

Here is the current approach I am using, but it seems too repetitive and cumbersome:

**[I am generating a new variable in order to organize all of these multiple responses]
. generate bq1_a22b_rel_cd1_type = .

***Buddhist [Buddhism was mentioned in “Other” and coded as “03”
**So I proceeded to begin defining my new variable]
. replace bq1_a22b_rel_cd1 = 1 if (bq1_a22_othrel==1 & bq1_a22b_rel_cd1==3 & bq1_a22b_rel_cd1!=.)

****Christian [Here is where it becomes more difficult
****because there are 30 different codes that can be categorized
*****as Christian]
. replace bq1_a22b_rel_cd1 = 2 if (bq1_a22_othrel==1 & bq1_a22b_rel_cd1!=.) & (bq1_a22b_rel_cd1==67 | bq1_a22b_rel_cd1==43 | bq1_a22b_rel_cd1==13 [etc., i.e., using “ | bq1_a22b_rel_cd1==” with each of the following codes below])


[These are the codes for Christians

I also planned on continuing this for the other categories of Q. A21 (Hindu, Jewish, Muslim, Other) using the other coded “other” responses.

Once that was done I planned on generating another new variable so that I could combine answers from Q. A22a and Q. A22b in one variable that reflected the values of the original question Q. A22a.

Note: I have also come across posts about egen and forvalues, etc. in terms of organizing multiple responses or organizing data, but none have addressed an example quite like this one where there really is no order or logic to numbers assigned and they are not necessarily sequential either.

My knowledge of Stata is obviously limited … so I am not even sure if my initial inclination to deal with such data by generating a new variable, then replacing values, is even a typical approach.

I would appreciate any guidance. Thank you, Jaime