Name: [SOLVED] MATH70071 Applied Statistics Python
Brand: Assignment Chef
SKU: 50544
Price: 25 USD
Availability: InStock
Rating: 5 (1 reviews)

5/5 - (1 vote)

MATH70071 Applied Statistics

end–of–module assignment

Submission deadline:

12:00 (noon) on Friday, 13/12/2024

Preparing your assignment

1. Use the Rmarkdown template ﬁle in the Software folder of the MSc in Statistics 2024-25 Blackboard page to write your report. Your R code should be provided in the appendix; this should be produced automatically by the template provided. Ensure your submitted ﬁle has tidy and well documented code chunks.

2. The report should be properly structured, and should be written using complete sentences. Marks are given both for the content of the report (correctness of code, numerical answers, etc.) and the quality of the presentation (clarity of plots, explanations, etc.). Two or three sentences is su cient for the verbal/explanatory parts of questions; longer answers are likely to be less clear.

3. At the beginning of your report you must include this statement of originality:

“I, CID [YOUR CID], certify that this assessed coursework is my own work, unless other- wise acknowledged, and includes no plagiarism. I have not discussed my coursework with anyone else except when seeking clariﬁcation with the module lecturer via email or on MS Teams. I have not shared any code underlying my coursework with anyone else prior to submission.”

Submitting your assignment

1. Before the above deadline submit a single PDF report via Blackboard (with, as above, your R code included as an appendix).

2. The ﬁlename should be MScStatistics AppliedStatistics [YOUR CID].pdf so, e.g., MScStatistics AppliedStatistics 00123456.pdf .

Sociologists in Australia surveyed the public to assess the relationship between the perceived respect of diferent jobs and some objectively measurable attributes of these jobs. The results of the survey are in the table jobs .csv which has these columns:

column name	parameter i	meaning row number	units/values
job:	j	type of job
class:	c	class of job	bc, wc, prof
salary:	s	average annual salary of people doing this job	$1,000
education:	e	average education of people doing this job	years
frac men:	f	fraction of people doing this job who are men
respect:	r	average perceived respesct of the job

The job classes correspond to broadly-accepted classiﬁcations: “blue-collar” (bc, e.g., a factory worker or builder); “white-collar” (wc, e.g., an office worker or accountant); and “professional” (prof, e.g., a statistics lecturer or an astrophysicist); s, e, f and r are all numerical quantities.

The overall aim is to use this survey data to obtain a quantitative understanding of if and how the perceived respect of a job and/or its class are linked to objective measureable quantities.

1. Plot r against each of the numerical quantities s, e, and f, indicating c by a diferent colour or symbol.

Based on these plots, summarize briefly i) the implications for what the perceived respect of a job might be linked to and ii) what, if any, features of the data-set might make the subsequent ﬁtting/modelling difficult. (6 marks)

2. Considering the relationship between r and s alone, use the Stan package to ﬁt the data-set using these two regression models:

Model 1: ri = β0 + β1 si + ∈i

and

Model 2: ri = β0 + β1 log(si) + ∈i ,

where P(∈ijσ) = N(∈i; 0, σ2 ), with σ included in the ﬁt as a parameter (i.e., along with β0 and β1 ). State what prior distribution you have assumed for (β0, β1 , σ) and your reasoning behind this choice.

Plot some posterior draws under the two models as curves against the data and comment on the quality of the ﬁts under both models.

Calculate an approximate Bayes factor, B1;2 , as the ratio of the maximum likelihoods under each of the models. Is this consistent with the conclusion from the visual comparison? (10 marks)

3. Use the glm function in R to ﬁt the data using the model

ri = β0 + β1 log(si) + β2 ei + β3 fi.

which now includes all three numerical parameters in the regression.

Report the results of the ﬁt and use the glm summaries to assess which, if any, of the coefficients/terms should be ignored. (10 marks)

4. For the subsequent questions perform the analysis assuming the available background

knowledge K can be encoded in a prior distribution of the form P(β0 , β1 , β2 jK) = N(β1 ; 10, 102) N(β2 ; 5, 52).

Explain qualitatively what information is being encoded by this prior.

Identify whether this is a proper or improper prior and what the implications are for i) parameter estimation and ii) model comparison. (5 marks)

5. Fit the data-set using the mcmc package with the model

ri = β0 + β1 ei + β2 log(si) + ∈i,

both with i) a normal distribution of the form.

P(∈i
|σ) = N(∈i
; 0, σ2
) = (2π) 1/2 σ/1 e − ∈2i/(2σ2)

and ii) a scaled Cauchy distribution of the form.

P(∈i
|σ) = Cauchy(∈i
; 0, σ) = π σ (1 + ∈2i/σ2)/1,

in each case (again) including σ as a parameter to be ﬁt (i.e., along with β0 , β1 and β2 ). As in Question 2, state what prior distribution you have assumed for σ and your reasoning behind this choice.

Plot and compare i) the joint posterior distribution in β1 and β2 and i) the marginal posterior distribution in σ under both the normal and Cauchy models. Simple scatter plots and histograms are acceptable to exhibit the results; for full marks show the 39.3% and 86.3% highest (posterior) density credible regions on the joint plots and the 68.3% and 95.4% highest (posterior) density credible inteverals on the marginal plots.

Comment on the diferences in the results under the two models, explaining the reason(s) for these diferences and which of the two models should be preferred. For full marks demonstrate this result more quantitatively by, e.g., comparing simulated data from the two best-ﬁt models to the actual data used for the ﬁt. (17 marks)

6. You now ﬁnd out that r is actually the percentage of people surveyed who say they respect a particular occupation, so must be between 0 and 100 (inclusive).

Explain why the models used above are inconsistent with this new information.

Devise and describe mathematically (e.g., by specifying the sampling distribution or like- lihood) a modiﬁed regression model which would correctly handle this restriction on r.

For full marks implement this algorithm using any of the glm, mcmc or stan functionality, summarising the results with parameter estimates and uncertainties. (12 marks)

(total: 60 marks)

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[SOLVED] MATH70071 Applied Statistics Python

Reviews

Related products

[Solved] Python Program 8 solved

[SOLVED] COP 3223 Program #2: P2 Lottery

[SOLVED] COP 3223 Program #4: Turtle Time and List Power

[Solved] Program that has three functions: sepia(), remove_all_red(), and gray_scale()

[SOLVED] Naughty Receiver – Reliable Data Transfer

[SOLVED] Programming Project for TCP Socket Programming