1 Notes on Time Series Models and Forecasting
These notes cover some relatively simple tools to help you understand and fore-
cast macroeconomic data. We will discusss stochastic processes, with a focus on
autoregressive (AR) models. We will discuss how to work with these processes,
how to estimate the parameters, and how to forecast with these processes.
2 Stochastic Processes
We will use the concept of a stochastic process to develop our process for making
forecasts. The formal process for making forecasts involves the expectations
operator. We describe these below.
2.1 Preliminaries
A stochastic process generates a sequence of random variables, indexed by time.
If {yi} is a stochastic process, its sample path, or realization, is an assignment
to each i of a possible value for yi. Thus, a realization of {yi} is a sequence
of real numbers, indexed by time. For example, suppose we have the variable
GNP which is measured annualy, and we have values for GNP for 50 years. Its
sample path is those 50 data points for GNP.
2.2 Understanding the Expectations Operator
We will use the expectations operator in these notes. This will help us mathe-
matically express how we make forecasts and how we evaluate the moments of
random variables. When applied to a random variable, the expectations opera-
tor means to find the expected value of the random variable. The expected value
will be formed based on the random variables stochastic process, which we will
discuss below. The expectations operator is denoted as E. It is a linear opera-
tor. For the random variable y, and for the parameter , this linearity means
that E(y) = E(y). That is, because the expectations operator is linear, we
can just pull out multiplicative constant terms from inside the brackets.
The expectations operator can also be used to find the expected value of
a function of a random variable. For exampe, take the mean zero random
variable y, which has a constant variance that is denoted by 2y. Then we have
that E(y2) = 2y.
1
2.3 Introduction to Stochastic Processes
Note that we only observe one particular time series sample path, or realization
of the stochastic process {yi}. Ideally, we would like to observe many different
realizations of the stochastic process. If we did get a chance to see many re-
alizations (in particular, suppose that the number of realizations went to ),
then we would form the expected value of the random variable y at date t as:
E(yt) = lim
N
1/N
N
i=1
yit
This is called the ensemble mean. It is a theoretical concept, but is useful to help
us fully understand what a stochastic process is. Of course, we only see a single
realization of U.S. GNP we dont get a chance to see other realizations. In
some cases, however, the time series average of a single realization is a consistent
estimate of the mean, and is given by:
E(yt) = 1/T
T
t=1
yt
Our goal for forecasting is to predict the future value of a random variable,
using infomation that is available today. That is, we will make a forecast of
the random variable, y, at a future time. Note that for definitional purposes,
we will always define the very first observation, or data point, of the random
variable to be period number 1.
Next, suppose we want to forecast the random variable y one year ahead
using information available to us right now at date t. We then want to formulate
the following mathematical expression:
Etyt+1 | It
Note what this expression tells us. Etyt+1 means we make a forecast of the
variable yt+1. Note that we have subscripted the expectations operator with
a t. This means we are forming the expectation using information that is
available to us as of today (time t). On the right hand side, It denotes the
information set that we use, and that contains information up through today
(and may also include information from the past). Thus, this expression denotes
the mathematical prediction for the variable y one period into the future. We
will look at examples of this below.
In some cases, the unconditional mean of the random variable will be the
best predictor. But in some cases, we can make use of other information to
predict the future, instead of just using the mean of the random variable. These
notes tells us how to identify and estimate the parameters of the best predictor
of a random variable.
2
2.4 Autocovariance
Autocovariance tells us how a random variable at one point in time is related to
the random variable at a different point in time. Consider a mean zero sequence
of random variables: {xt}. The jth autocovariance is given by:
jt = E(xtxtj)
Thus, autocovariance is just the covariance between a random variable at
different points of time. Specifically, this tells us the extent to which the random
variable today, and the random variable j periods ago tend to be statistically
related. If j =1, then this formula tells us how the random variable is related
between today and yesterday. Notice that if j = 0, then the 0thautocovariance
is just the variance: E(xtxt) = E(xt)2. The autocovariance can be estiamted
from a sample of data, and is given by:
jt = 1/(T j)
T
1+j
xtxtj
Note how we construct this autocovariance. We need to adjust the sample
length to take into consideration that we will be calculating relationships be-
tween current and past values of the random variable. To see this, suppose that
j = 1. In that case, we begin with observation number 2, which allows us to
connect the random variable in period 2 with the random variable in period 1,
which as you recall from above is the first period. Suppose that j = 2. Then we
begin in period 3, which allows us to connect the random variable in period 3
with the random variable in period 1.
Why are autocovariances important?
Autocovariance measures the covariance between a variable at two different
points in time. If there is a relationship between a variable at different points
in time, then we can potentially use current and past values about the random
variable to predict its future.
Forecasting: If we know that the statistical relationship between a variable
at different points in time, this can help us forecast that variable in the future.
For example, suppose output is higher than normal today, and that when output
is higher than average today, it also tends to higher than average tomorrow. This
will lead us to forecast output tomorrow to higher than average. The details of
how we make forecasts will be considered later.
Economic Modeling: Our economic models summarize the behavior of
economic variables. If variables have large autocovariances, then some mecha-
nism is causing persistence in these variables, and our models should explain
that persistence through preferences, technologies, policies, or shocks. On the
other hand, if variables have zero autocovariances, then the variables have no
persistence, and our models should explain this as well.
3
2.5 Stationarity
2.5.1 Covariance Stationarity (Weak Stationarity)
If neither the mean of a random variable nor the autocovariances of the random
variable depend on the calendar date, then the stochastic process is called co-
variance stationary. This is also called a weakly stationary stochastic process.
The two terms are interchangeable. Technically, these requirements are given
by:
E(yt) =
E(yt )(ytj ) = j
Note that the calendar date does not affect either the mean nor the autoco-
variances in the terms above.
Example 1: Suppose {yt} is a mean zero process, with all autocovariances
equal to zero, and with a constant variance denoted as 2. Verify that this
process is covariance stationary.
Example 2: Suppose yt = t + t, where t = {1,2,3,} and is a normal
random variable with mean 0 and variance 2. Show that this process is not
covariance stationary.
Exercise: show that for a covariance stationary process, that the following
property holds: E(ytytj) = E(ytyt+j).
(Note that since we define j as E(ytytj), then we define j as E(ytyt+j))
2.5.2 Strict Stationarity
For what we will focus on in this class, covariance (weak) stationariy is what is
required. I will spend just a bit of time discussing strict stationary.A process
is strictly stationary if the joint distribution for the stochastic process does not
depend on time. Note that a process that is strictly stationary with finite second
moments must be covariance stationary. Since many issues we are interested in
dont require strict stationary, we will hereafter refer to a stationary time series
as one that is covariance stationary.
2.5.3 Autocorrelation
Just as it is useful to normalize covariances by dividing them by the respective
variables standard deviations, it is also useful to normalize autocovariances.
The jth autocorrelation is denoted as j =
j
0
, and is given by:
E(ytytj)
E(yt)2
E(ytj)2
Note that for 0, it is equal to 1. Thus, autocorrelation tells us the correlation
between a variable at two different points in time.
4
2.6 The White Noise Process
White noise is a serially uncorrelated process. That is, all of the autocovariances
are zero. Consider the following realization, {t} with zero-mean. Its first and
second moments are given by:
E(t) = 0
E(2t ) =
2
E(t ) = 0, 6= t
Note that this latter feature implies that the autocovariances are zero. For
the white noise process, the best prediction of the future value of the random
variable is just its mean value.
The white noise process is key because it is the building block for most of
the other stochastic processes that we are interested in.
The importance of white noise?
In studying certain classes of models, including rational expectations models,
we will see that the change in prices of some assets should be white noise. We we
will also be interested in understanding how a macroeconomic variable responsds
to a completely unanticipated change in an exogenous variable. This is called
impulse response function analysis. We will discuss this later in the course, time
permitting.
2.7 Moving Average Processes
Recall the white noise process, {t}.We now use this process to build a process
in which one can use previous values of the variable to forecast future values.
The first such process is the moving average (MA) process. We first construct
the MA(1) process:
yt = + t + t1, E() = 0
This is called a moving average process because the process is a weighted
average of a white noise process.
The term t is often called an innovation. Note that it is a mean-zero
process. We will also assume that has constant variance: E(2) = 2
The unconditional expectation of this process is:
E(yt) = + E(t) + E(t1) =
The variance is of y is a function of the variance of :
5
E(yt )2 = E(t)2 + 2E(t1)2 = (1 + 2)2
The first autocovariance is given by:
E((yt )(yt1 )) = 2
To see this, we simply expand the expression as follows:
E((yt )(yt1 )) = E((t + t1)(t1 + t2))
Expanding this further, we see that there are four terms in this expression
as follows:
E(tt1)
E(tt2)
E(t1t1)
E(2t1t2)
Because is a white noise process, there is only one non-zero term among
these four terms, which is E((t1)(t1)). Note that this is equal to 2.
Exercise: Verify that all other autocovariances are 0, and verify that this is
a covariance stationary process. .
The MA (1) process has non-zero autocorrelation at lag 1. It is given by:
1 =
2
(1 + 2)2
=
1 + 2
The magnitude of this coeffi cient depends on the value of the parameter .
But note that the maximum value is 0.5.
How do shocks today affect this random variable today and into the future?
By construction, a one-unit shock to t today changes the random variable yt
by the same amount. Tomorrow, this shock affects yt by factor . But after
that, the shock today has no affect on the random variable of interest.
The qth order MA process is denoted by MA(q), and is given by:
yt = + t
q
i=1
iti
Note that its variance is given by:
6
0 = (1 +
2i )
2
Exercise: verify that the variance is given by this formula
The first autocorrelation, 1,is given by:
1 =
1 + 12
1 + 21 +
2
2
Note that in these higher order MA processes, a shock today af-
fects y for more periods. In particular, the number of periods into
the future that a shock today has an effect over time for the number
of periods as the order of the process.
We will need one more assumption to talk about well-defined MA processes
when q is . In this case, we will assume what is called square-summability.
This is a technical requirement:
j=0
2j <If the process is square summable, then it is covariance stationary.2.8 Autoregressive (Markov) ProcessesAutoregressive, or Markov processes, are stochastic processes in which the ran-dom variable is related to lagged values of itself. The first-order process, or AR(1) process is given by:yt = + yt1 + tAssume that is a white noise process, with constant variance and themean is zero.We will focus on stationary AR processes. If || < 1,then it iscovariance stationary. To see this, solve the difference equation using backwardssubstitution, which yields:yt =1 +i=0itiNote that by solving this difference equation backwards, we have re-writtenit as a MA().This is called the moving average representation of the AR(1). Thisprocess is covariance stationary provided the following restriction holds:70i =11 <The mean of this process is:E(yt) =1 The variance is:0 = 2 11 2the jth autocovariance is:j = 2 j1 2The jth autocorrelation is thus:j =j0= jExercise: Show that the unconditional mean of yt, is given by1 , and thatthe variance is given by 2 112The second order autoregressive process is:yt = + 1yt1 + 2yt2 + tRecall from the study of difference equations that this equation is stableprovided that the roots of:(1 1z 2z2) = 0lie outside the unit circle. If this is satisfied, then the process is stationary.Note that econometric software programs will do this for you.Note that the equation above is a quadratic equation, with two solutions(two roots). Thus we need both of those roots to be less than one in absolutevalue to be stationary.The importance of AR processes?Statistically, almost ALL economic time series are well approximated by loworder AR processes .Behaviorally, many of the dynamic economic models we use in economicscan be represented, or well-approximated, as linear autoregressive processes82.8.1 High Order AR ProcessesThe pth order AR process is given by:yt = +pi=1iyt1 + tIt is stationary provided that the roots of1 1z 2z2 … pzp = 0all lie outside the unit circle.The autocovariances and autocorrelations are solved for analagously to thesecond order case.2.9 ARMA ProcessesARMA processes contain both autoregressive and moving average components.The ARMA (p,q) process is:yt = +pi=1iyti + t +qj=1jtjStationarity requires the usual assumption on the roots of the pth order ARpolynomial (that is, they all lie outside the unit circle):1 1z 2z2 … pzp = 03 Principles of ForecastingWe now discuss using current and past values of variables or their innovations.Define the collection of this information to be X.First we define the forecast error:t+1 = yt+1 yt+1 |tMean square forecast error is:E(yt+1 yt+1 |t)29One possibility in forecasting is to minimize mean-square forecast error. Ifwe have linear models, such as:yt+1 |t= Xtthen the forecast that minimizes mean square error is the linear projection of yon X, which satisfies the followingE(yt+1 Xt)X t = 0Note that is analgous to least squares in regression analysis – the difference isthat the linear projection involves the population moments, while least squaresinvolves the sample moments.3.1 Forecasting an AR(1) ProcessLets forecast an AR(1) process, which is very straightforward. Note that thebig picture idea here is that we exploit the fact that the future is related to thepast in order to make forecasts.The process is given by:yt+1 = 1yt + t+1, E() = 0Since our best forecast for the term t+1 is 0, then our forecast is given by:Etyt+1 = 1ytThe one-period forecast error is given by:e1t+1 = yt+1 1yt = t+1The one-period forecast error variance is give by:V ar(yt+1 1yt) = 2Note that we can construct a forecast interval. Assuming that the data arenormally distributed, then we can construct a 95% confidence interval aroundour one-period forecast as:Etyt+1 Z0.252For two periods, we have the following:10yt+2 = 1yt+1 + t+2and our forecast is given by:Etyt+2 = 1Etyt+1 = 21ytThe 2 period forecast error is given by:e2t = yt+1 2yt = t+1 + t+1The 2 period forecast error variance is given by:V ar(yt+1 2yt) = (1 + 2)2Thus, the N period forecast is given by:Etyt+n = nytNote that as N gets large, then the forecast converges to the unconditionalmean, which is zero.Practice exercise: derive the formula for the variance of the N-period forecasterrorIf we have a non-zero mean, then it is easy to incorporate that component.yt = + 1yt1 + tNote that we also have:yt+1 = + 1yt + t+1Now, lets form the forecast as follows:Etyt+1 = + 1yt3.2 Forecasting an AR(2) ProcessThe stationary AR(2) process is:yt = + 1yt1 + 2yt2 + t, E() = 0The forecast is given by:Etyt+1 = + 1yt + 2yt111Lets look at a two period forecast for the AR(2). Recall that we have:yt+2 = + 1yt+1 + 2yt + t+2, E() = 0To construct the best two period forecast, we begin as follows:Etyt+2 = + 1Etyt+1 + 2ytSubstituting in from above for the forecast for yt+1, we get:Etyt+2 = + 1(+ 1yt + 2yt1) + 2ytLets look at the 3-period ahead forecast:Etyt+3 = + 1Etyt+2 + 2Etyt+1Now, substitute in and we get:Etyt+3 = + 1(+ 1(+ 1yt + 2yt1) + 2yt) + 2(+ 1yt + 2yt1)Note that as the forecast horizon gets longer, we get more and more termsin the forecasting equation, but forecasting software packages will do this workfor you so that you dont have to!Note also that the same approach can be used for higher order AR processes.You just write out the forecasting equation, and then evaluate each term on theright hand side, substituting in where necessary.4 Estimating the Parameters of AR ModelsOLS works well for estimating AR models. The idea is to treat the laggedvariables as explanatory variables (or “X” variables that you have learned ineconometrics).We illustrate this with the AR(1).Minimizing the sum of squared residualsis given by:min,Tt=2(yt yt1)212This implies:[olsols]=[T 1yt1yt1y2t1]1 [ ytyt1yt]The estimate for the innovation variance, 2ols is given by:2ols = (yt ols olsyt1)2T 1Note that we lose one observation from the dataset because the right-handside variable in the regression is lagged one period. For the AR(2), we lose twoobservations, etc. Thus for the AR(P) model, we estimate the parameters usingOLS and we keep in mind that we will lose P observations from the dataset.For statistical inference, we can use the same tools that you have used previ-ously in econometrics. Specifically, one can use the t-test to test the significanceof the stationary AR parameters.The estimation of MA models is complex, so we will leave that for you tolearn in the next quarter.5 Diagnostic StatisticsTypically, we when we fit AR models for forecasting, we will not know the datagenerating process. Therefore we will have to make an initial guess regardingthe type of model, and then test this guess. This can be boiled down as follows:(1) Guess the model (start with an AR (1))(2) Estimate the parameters(3) Assess the adequancy of the model5.1 Guessing the type of modelAn important principle to keep in mind is simplicity: it is typically better toconsider simple models over complicated models. Thus, start from an AR(1),and then test if it is adequate. If it is, then the resdiuals will be white noise.Testing for residual autocorrelationIf you have estimated a decent model for the process, then the residuals fromthe process should be white noise. In other words, there should be no autocor-relation in those residuals. A simple approach is to graph the autocorrelationsof the residuals, and visually inspect them to see if there is substantial autocor-relation. A formal statistical test for white noise is the Box-Ljyung test. Thisis given as:13Q = T (T + 2)P=1r2T 1where T is the number of observations and P is the number of autocorre-lations being tested, and r is the order of the autocorrelation. Under the nullhypothesis of white noise, then the test statistic Q is distributed as a 2 randomvariable with P-p-q degrees of freedom, where p is the order of the AR compo-nent of the model, and q is the order of the MA component of the model. Auseful approach is to pick P = 6, that is, 6 autocorrelations.6 Achieving Stationarity: Part 1Economic Time Series often violate our assumption of covariance stationarity.In particular, their mean is typically changing over time. Thus, the averagevalue of GDP in the U.S. in the 1990s is much higher than the average value ofU.S. GDP 100 years ago.For the time being, we will deal with this type of nonstationarity simply byusing stationary-inducing transformations of the data. We will now consider2 of these transformations. But before we develop these transformations, apreliminary transformation to use is to take logs of the time series, unless theyalready are in logged form (e.g. interest rates). This is useful, since the timeseries typically are growing, and also is a useful way of dealing with certaintypes of heteroskedasticity.6.1 First-differencingThe first approach we will consider is to take first differences. Thus, after takinglogs, simply define a new variable, yt,where it is defined as:yt = yt yt1Given that we have logged the variable, note that this transformation mea-sures the growth rate in the variable. This type of transformation almost alwaysinduces stationarity for processes that have means (in log levels) that changeover time in a systematic way (e.g. trends).To understand this, note that the log-difference transformation of a variablerepresents that variable in terms of its growth rates – that is, log-differencing realGNP yields the growth rate of GNP. Most growth rates of economic variablesare stationary.6.2 Removing Deterministic Trend ComponentsAn alternative approach to inducing stationarity for processes that grow overtime is to remove deterministic trend from their logged values. Removing alinear trend means taking the residuals from the following regression:14ut = yt tIn addition to removing linear trends, one may also add a quadratic, cubic,etc. terms to this regression. In practice, removing these higher order terms isnot commonly done.15
Reviews
There are no reviews yet.