[SOLVED] R math database graph School of Mathematics

$25

File Name: R_math_database_graph_School_of_Mathematics.zip
File Size: 405.06 KB

5/5 - (1 vote)

School of Mathematics
MATH5714: Linear Regression, Robustness and Smoothing Practical: 2019
There have been a number of worked examples using R in the module. All of these are on MINERVA, and some of these may contain useful commands for this practical. If you have specific questions on R, please feel free to email me (or come to my office) for assistance. Alternatively, you may want to ask me any questions during the practical session which will take place on Monday 2nd December. If you do not have any specific questions, you do not need to attend this session.
You should write up your practical using WORD (or LATEX), with all graphical and R output correctly incor- porated. The total length should not exceed 12 pages (but it could be shorter).
NOTES:
A. You must hand in your solutions to my pigeon-hole (NOT Minerva) by 2pm (GMT) on Thursday, 12th
December.
B. In accordance with policies of the School of Maths: For every period of 24 hours or part thereof that your
assessment is overdue, you will lose 5% of the total marks available for the assessment.
C. If you have special mitigating circumstances that lead you to ask for an extension, you should make your request in the School of Maths Taught Student Office.
D. Within reason you may talk to your friends about this piece of work, but you should not send R code (or output) to each other, and your report must be only your own work.
This practical is deliberately open-ended, with little guidance on how to proceed.
Q1 The Databank of the worldbank1 collects data (indicators) every year on each country of the world in order to examine trends, relationships, effect of policies, development, etc. Two of the variables (area, and population are given for 2010 in the data.frame which can be read in by the R command (watch out for the if you copy and paste):
dd=read.table(http://www1.maths.leeds.ac.uk/ charles
/math3714/area-populaton.txt,header=TRUE)
(i) Using appropriate transformations of the data, find a linear model which can describe the relation- ship between population (response) and area (explanatory).
(ii) Using appropriate diagnostics, confirm that your model is acceptable.
Guidance: In your answer, you only need to describe your final model, and ONE other model which you have examined, but deemed less appropriate.
(iii) Using your model obtain a 95% confidence interval for the mean (expected) population, for a coun- try with an area of 250,000 Km2.
Q2 In this question we are going to consider many more variables in the database. Because there are so many missing entries, a set of variables and countries were selected such that there were no missing values. The file is the same location as before, but now with file name: worlddata-indicators.txt. Note that we now have only 149 countries.
We will take the response variable to be CO2 emmissions per capita (CO2), which is column 15 of the data frame after reading in to R.
1You may want to check the meaning of the variables in the worldbank website: https://databank.worldbank.org/home.aspx

(i) With due consideration to: transformations,
interactions,
model selection, model checking, variable selection, etc.
obtain a model which is able to predict CO2 using the other variables.
(ii) Justify your choice by comparing at least two competing models. The comparison should take
note of at least (a) model selection criteria, (b) diagnostics, and (c) interpretability.
(iii) Interpret the parameters in your preferred model.
Guidance: Remember, there is probably no ONE correct answer. The important thing is that you justify your approach.
Q3 In this last question we are going to fit nonparametric regresion models to inflation data. The data frame is inflation.txt (same place as previously) and consists of 3 columns. The first column is the country code, column 2 is to be treated as the explanatory variable (Inflation, GDP deflator (annual %)) and column 3 the response (Inflation, consumer prices (annual %)).
You may find the code used in lectures to be useful for this question.
(i) Using the data (xi,yi) in the data frame, create a scatter plot of the data and add nonparametric regression lines which shows the fitted value m (x) for x in the range (5, 50). Plot one graph which shows the Nadaraya-Watson estimate for smoothing parameters h = 1,2,5,10, and a separate graph which shows the local linear estimates for the same four values of h. Comment on these graphs.
(ii) For each of the 8 estimates computed in part (i), find the predicted value m (x) when x = 4.2. Arrange these values in a suitable 2 4 table.
(iii) Using leave-one-out cross-validation find the optimal choice of h in the range (.7, 2.7) for the NW estimate, and (1.7, 3.0) for the LL estimate. In the same plot draw the lines corresponding to the cross-validation functions as a function of h.
(iv) Replot the data, and draw on the fitted nonparametric regression lines corresponding to the optimal values of h. Comment on the fits.
Predict m (x) for x = 4.2 using the corresponding optimal values of h for the NW and LL estimates respectively. Which of these predictions do you think will be better, and why?

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] R math database graph School of Mathematics
$25