, , ,

[SOLVED] Math38141 regression analysis – coursework

$25

File Name: Math38141_regression_analysis_–_coursework.zip
File Size: 414.48 KB

5/5 - (1 vote)

MATH38141 Regression Analysis – Coursework

This coursework accounts for 20% of overall mark for this course and it may take around 10 hours to complete. Please present your solution in the form of a report, which you should upload on Blackboard as a single file before the deadline. You can use R to perform your calculations, but you must show the formulae in the text (not as R code) that you have used for the calculations. Marks will be awarded for correct and accurate calculations and their interpretation. Interpretations should be explained in words, referring to the context of the exercise, rather than naming generic symbols only. High marks will be less likely if the presentation of the results is unclear, too short or unnecessarily long and confusing, or if any formulas used in the calculations are missing from the text.

Submit your solution as a single file to Blackboard by 6pm on Wednesday, November 22, 2023.

  1. Jane, an amateur cinema enthusiast, has decided to collect information on the success of her top 10 favourite Hollywood movies. The dataset contains the following 4 variable for each film:

    • BoxO ce – Box office net sales (money earned from ticket sales) in the first year, in million of US dollars ($);

    • Production – Production costs, in million dollars;

    • Promotion – Promotional costs, in million dollars;

    • Books – Total books sales (money earned from the sales of the books the movie is based on), in million dollars.

      A tab-delimited text file with a table of the data, called films.txt, is available on Black- board.

      1. Draw scatterplots of BoxO ce against each of the other three variables. Describe any observable trends in your plots.

      2. Formulate a multiple linear regression model for the dataset, using BoxO ce as the response and the remaining three variables as regressors.

      3. Calculate the LSEs and construct 95% confidence intervals for all regression coefficients.

      4. Provide an interpretation for the estimated coefficients obtained in (c).

      5. Calculate and provide an interpretation for the R2 statistic for the model.

      6. Jane argues that, when fitting a multiple linear regression model to the data using BoxO ce as the response and the other variables as the explanatory variables, the intercept term β0 should be set to zero. Is this argument reasonable? Why?

        (7 marks)

        Excited about discovering more about her favourite film, Jane decides to test a theory and see whether the success of the film is really linked to the success of the book, or whether one might just need to know about the amount of money invested in producing and advertising the film. To investigate whether Books also affects BoxO ce, Jane fits two multiple linear regression models to the BoxO ce data:

        • Model 1, with explanatory variables Production and Promotion;

        • Model 2, with explanatory variables Production, Promotion and Books.

      7. Decide which one is the reduced model. Then fill in the following ANOVA table to compare the nested models.

        Source

        s.s.

        d.f.

        m.s.

        F-ratio

        Regression fitting reduced model

        Extra

        Residual fitting full model

        ?

        ?

        ?

        ?

        ?

        ?

        ?

        ?

        ?

        Total

        ?

        ?

      8. Calculate the p-value associated with the significance of Books. Do you think Books

        should be included in the multiple linear regression model?

      9. Regressing BoxO ce on Books alone, test at the 5% level the significance of Books under this simple linear regression model. Does your conclusion contradict that given in (h)? Comment.

      (4 marks)

  2. A dataset concerns the net sales of shops in various locations in the USA. It contains the following variables:

    • ANS: Annual net sales (in thousands of $);

    • NSF: Number of square feet (in thousands);

    • INV: Inventory, i.e. the total price of goods owned by the shop (in thousands of $);

    • ASA: Amount spent on advertising (in thousands of $);

    • SSD: Size of sales district (in thousands of families);

    • NCS: Number of competing stores in the district.

A tab-delimited text file with a table of the data, called greens.txt, is available on Black- board.

A multiple linear regression model Λ is proposed to describe the relationship between the response variable ANS and the other 5 explanatory variables (NSF, INV, ASA, SSD, NCS).

A retail expert believes, however, that the variation in ANS can be adequately explained by the variable INV alone, and hence proposes a simple linear regression model ω for the data.

  1. Specify the models Λ and ω, and state the model assumptions clearly.

  2. Calculate the residual sums of squares fitting Λ and ω respectively.

  3. Explain why in (b) the residual sum of square of Λ is not larger than that of ω.

  4. Under model Λ, test whether the regression coefficients of ASA and SSD are 15 and 10, respectively, at the 10% significance level, and explain your conclusions.

  5. Suppose that we want to compare how well two new shops in two locations will perform:

    shop

    NSF

    INV

    ASA

    SSD

    NCS

    1

    3.0

    500

    5.0

    5.0

    5

    2

    5.0

    500

    10.0

    10.0

    10

    Calculate the predicted difference in annual net sales between shop 2 and shop 1. Do we predict the two shops to perform significantly differently at a 5% significance level?

  6. It is suggested that the relationship between ANS and INV depends on the number of competing stores in the district, i.e. on NCS.

    1. Propose a new model Λ1 to reflect this suggestion, making sure model ω is nested within Λ1. Exclude the other regressor variables (i.e. NSF, ASA and SSD).

    2. Carry out a hypothesis test to compare ω against Λ1 and make conclusions.

    3. Based on the fitted model Λ1, plot four fitted regression lines on the same diagram to display the relationships between ANS and INV for the four values of NCS of 0, 4, 8 and 12, respectively.

Comment on the changes in the relationship between ANS and INV for these four different amounts of competition.

(9 marks)

[Total: 20 marks]

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] Math38141 regression analysis – coursework
$25