STAT6128
Key Topics in Social Science: Measurement and Data
Computer Workshop 4 -Social Mobility
The data
The data we shall be using today comes from the 2006 Programme for International Student Assessment (PISA). This data is designed to be cross-nationally comparable across a wide selection of developed nations. Today we shall focus on occupations. Recall from the lectures that this is the primary outcome of interest for Sociologists. However, in PISA, we cannot measure social mobility in itself; PISA is cross-sectional data, and therefore we do not have any information on childrenβs eventual outcomes. Instead, we shall investigate the relationship between parental occupation and 15 year old childrenβs occupational expectations (what job they expect to have when they are 30 years old). So just for today, think of these expectations as if are actual outcomes. (As an aside, there has been some work by sociologists and economists who claim expectations mediate the link between social background and attainment during adulthood. So in fact this type of analysis could actually be quite interesting for our understanding of intergenerational mobility).
Start Stata. Create a do file like last week (use βversionβ to tell Stata which version you use, use the βcdβ command to tell Stata from where to open data files and where to save the do file, use the βuseβ command to open the Stata dataset PISA_IM, which you first need to download from Blackboard into the folder you name behind the βcdβ command.)
Like last week, write the bold command lines into your do file and the italic ones into the command window.
Country code and sample size
Once you have opened your data set type
label list Country
You receive the error command βvalue label Country not foundβ . As a consequence, the data do not contain any information on which value refers to which country. Given that the data set does not contain information on the country coding,I give it to you here:
Country code |
Country name |
208 |
Denmark |
276 |
Germany |
352 |
Iceland |
380 |
Italy |
410 |
Korea |
442 |
Luxembourg |
554 |
New Zealand |
616 |
Poland |
620 |
Portugal |
792 |
Turkey |
Type
tab Country
You see a table giving the 3-digit country code. Each number in this first column represents one country. The second column gives you the sample size per country, the third column the percentage of the sample per country.
Measurement of Occupation (Ganzeboom Index)
As mentioned in the lecture, there are many different ways one can βmeasureβ (or rank) occupations. The main method PISA uses is the Ganzeboom ISEI indexof social class. This is a βcontinuousβ measure of occupational prestige, and basically ranks occupations through their impact on peopleβs income.
To begin, we use this as our measure of occupation. The three variables of interest are: Fatherβs occupation is labelled BFMJ
Motherβs occupation is labelled BMMJ
Childβs (expected) occupation is labelled BSMJ
Let us investigate BSMJ first. To find out more about the distribution of the variable BSMJ, type:
sum BSMJ, d
Something is wrongβ¦β¦.more than the top 10% of data is coded at one point (β99β).
Normally missing values in Stata are coded as β.β As such,they would be excluded in all commands. However, the original data was coded in SPSS. In SPSS, the missing were coded with the value 99. Transferring the SPSS file into Stata leads to a data point 99, since the transfer was not done properly.
Type
label list BSMJ
You see that 97 and 99 values attributed to the variable are coded as missing values.
If the SPSS data had been transferred properly into Stata format, the missing values should be coded β . β
We will do that now ourselves.
Type
gen bsmj=BSMJ
(you generate a variable that has exactly the same values as your original BSMJ variable)
replace bsmj=. if BSMJ>96
Now type
sum bsmj, d
Compare this with the sum command beforehand. You see that if missing values are properly coded in Stata (with a β.β) then Stata does not show them.
Sometimes you might want to see them though. In this case you can type
tab bsmj, m
The m here tells Stata you want to see the missings. You see, that 17 % of values are missing for childrenβs expected occupation.
Also the variables BFMJ and BMMJ have allocated the values 97 and 99 to missings. Please independently try to create a variable bfmj and bmmj that have the missing values coded properly as β . β. The solution is given on the next page.
gen bfmj=BFMJ
replace bfmj=. if BFMJ>96
gen bmmj=BMMJ
replace bmmj=. if BMMJ>96
We now want to see how childrenβs expected occupation is associated with their parentsβ occupation. As our measures are βcontinuousβ, we shall use OLS regression.
Firstly, we need to take into account PISAβs complex sampling design. We covered this last week. The PISA survey design uses clustered sampling: first schools are selected and then students within schools. Clustering increases the standard error. We therefore need to tell Stata to take clustering into account.
Type:
svyset SCHOOLID [pw=W_FSTUWT]
This has set up the complex survey design. Now let us perform a regression, relating fathersβ occupation to the childβs expectation. We will estimate this model using all observations from all countries. Type:
svy: regress bsmj bfmj i.Country
The prefix i. before the variable Country indicates that this is a categorical variable. In this case, we have 10 countries (10 categories) in the variable Country. Hence Stata will create 9 dummy variables.
You should get something like the following output:
The table shows you that there are 788 schools in your data (Number of PSUs), the total sample size is 37,560 students.
Now interpret this table. Which country is the reference country? (Tip: look at the table with the country codes given beforehand)
The coefficient of interest is the one associated to BFMJ. It is positive and statistically significant. This suggests that a 1 point increase in fathers Ganzeboom index is associated with a 0.234 point increase in the childβs Ganzeboom index.
Remember, last week we talked in the lecture briefly about how to interpret regression results. The Ganzeboom index lacks a natural metric (scale). How could we give some more meaning to our results here? We could express the change in the Ganzeboom index in terms of standard deviations.
Find the standard deviations of bfmj and bsmj by typing:
svy: mean bfmj
estat sd
svy:mean bsmj
estat sd
You will receive the following results:
|
Mean |
Standard deviation |
bfmj |
42.73 |
15.86 |
bsmj |
60.59 |
16.81 |
Question:
If the fathers Ganzeboom index increases by one standard deviation, by how many standard deviations will the childβs index increase? You know that a 1 point increase in the fatherβs index increases the childβs index by 0.234 points.
0.234*15.86=3.71
Hence if fatherβs index increases by one standard deviation, the child index increases by 3.71 points. We can express the 3.71 points in standard deviations:
3.71/16.81=0.22 Result:
If the fatherβs Ganzeboom index increases by one standard deviation, the childβs index increases by 0.2 standard deviations.
In conclusion our regression results show that from an intergenerational mobility perspective, we can say that children of fathers with higher ranking occupations enter (or at least βexpect to enterβ) better jobs.
How does this vary across developed nations? To get a rough idea (and only this time ignoring the complex sampling design), type:
bysort Country: regress bsmj bfmj
tab Country,gen(C)
forval i=1(1)10{
svy, subpop(C`i’): regress bsmj bfmj
}
This generates a set of dummy variables for each country (named C1-C10); then uses a loop to execute a svy:regress command for each of these countries.
This has reproduced the analysis for each individual country. Notice the relationship is weakest in Turkey (country 792) and Korea (country 410). It seems that the jobs children βexpectβ to enter in these countries are not strongly associated with their fatherβs occupation. On the other hand, in Poland (country 616) the relationship is particularly strong.
Alternative measure of occupation
Perhaps in this case another way of measuring occupation may also be suitable.
The PISA dataset contains an alternative measure of occupation; 4 digit ISCO codes. This is the ILO classification of occupation, look at the following webpage:
http://www.ilo.org/public/english/bureau/stat/isco/index.htm
This data is very interesting because of its detail. Occupations are defined into over 300 categories. However, for today we will convert this into a binary measurement
(βProfessionalβ and βNon-Professionalβ jobs). In other words, we will examine the
relationship between whether a child is expecting to enter a professional job and whether the childβs parents have a professional job. (We could go further by using logistic regression to investigate this relationship. We will examine logistic regression in a later workshop.)
Let us start with this conversion. Create a variable called Student_Pro, which has the value 1 if the variable Student_Occ_ICSO is below 3000 (that means the student aims to become a βProfessionalβ) and it is 0 if the value of Student_Occ_ICSO is 3000 and above. In
addition, give the newly created variable Student_Pro a missing value β .β, if the value of a Student_Occ_ICSO is 9999. First, try yourself to create this variable Student_Pro. If you do not manage the code is given on the next page.
gen Student_Pro=.
replace Student_Pro=0 if Student_Occ_ICSO>2999
replace Student_Pro=1 if Student_Occ_ICSO<3000
replace Student_Pro=. if Student_Occ_ICSO==9999
Now create the variable Father_Pro and Mother_Pro using the same specification:
gen Father_Pro=.
replace Father_Pro=0 if Father_Occ_ICSO>2999
replace Father_Pro=1 if Father_Occ_ICSO<3000
gen Mother_Pro=.
replace Mother_Pro=0 if Mother_Occ_ICSO>2999
replace Mother_Pro=1 if Mother_Occ_ICSO<3000
Now type the following:
svy:tabulate Father_Pro Student_Pro , row
svy:tabulate Mother_Pro Student_Pro , row
What do these results show?
Up to now, we have looked at all countries together. Now letβs examine Poland and Korea separately.
Start with Korea. Type:
svy:tabulate Father_Pro Student_Pro if Country==410, row
svy:tabulate Mother_Pro Student_Pro if Country==410, row
Then do the same for Poland (code 616).
What results do you find? Compare the tables.
Measurement Error
We shall finish this part of the workshop by briefly considering the role of measurement error. Firstly, recall from the lectures that children act as proxy respondents for their parents. That is, it is children who report their parentsβ education and occupation (not the parents themselves). Children may not always report this correctly.
For this set of countries, however, data has been collected from both the parent and the child (note this was not done for all countries, and was not done in the PISA 2000 or 2003 waves). We can therefore investigate how well children report their parentsβ occupation. In particular,
Parent_Report_Father_Occ_ICSO is fathersβ reports of their own occupation
Parent_Report_Father_Pro is fathersβ reports about whether they are a professional Parent_Report_Mother_Occ_ICSO is mothersβ reports of their own occupation
Parent_Report_Mother_Pro is mothersβ reports about whether they are a professional
Letβs consider whether children can accurately report if their mother or father is a professional. Type (ALL ON ONE LINE):
tab Parent_Report_Father_Pro Father_Pro if Parent_Report_Father_Pro!=. &
Father_Pro!=., col
Look at the main diagonal (top left to bottom right). If there was no measurement error, all observations would be in these cells. Instead, we can see some misclassification: children report their father to be a professional when he is not (and viceversa). This is of course assuming that parents accurately report their own occupation β¦
Reviews
There are no reviews yet.