Time Series Analysis In Theory

  • A regular time series is a function from integers to real numbers: \(y_t = f(t)\).
  • Many useful time series can be specified using linear difference equations like \(y_t = k_1y_{t-1} + k_2y_{t-2} + \dots + k_ny_{t-n}\)
  • This recurrence relation has a characteristic equation (and matrix representation), whose roots (or matrix eigenvalues) can be used to write closed-form solutions like \(y_t=ax^t\).
  • Any time series combining exponential growth/decay and sinusoidal components can be modeled by a linear difference equation or its matrix representation.
Figure 1.
Fig. 1. Possible regimes for a 2nd-order linear difference equation with complex eigenvalues

So, maybe a couple of blog posts on time series analysis, starting with a little time series theory, then a practical time series analysis, and culminating in applying deep learning sequence models like transformers to time series. First, let’s refresh on the basics of time series analysis.

0. What is a time series?

A regular time series, where values are generated at evenly-spaced time intervals, is a function from integers to real numbers. It can be defined as a function from \(\mathbb{N} \mapsto \mathbb{R}\) if it starts at 0, or \(\mathbb{Z} \mapsto \mathbb{R}\) if it starts at \(-\infty\).

Just like \(y(x)=\sin (\frac{\pi}{k} x)\) is a real-valued function, we can have \(y(t) = \sin (\frac \pi k t)\) where \(t \in \mathbb{N}\).

When we build our models, we try to limit the form of our functions to something general and complex enough to fit the processes we encounter, yet simple enough that we can parameterize it to fit our data.

A simple but highly useful model is a process governed by linear difference equations. Let’s use \(y_t\) interchangeably with \(y(t)\).

\[y_t = k_1y_{t-1} + k_2y_{t-2} + k_3 y_{t-3} + \dots k_ny_{t-n}\]

where the \(k\)s are constants. Because we have \(n\) terms like \(k_iy_{t-i}\), this is an \(n\)th-order difference equation.

This simple construct can generate combinations of exponential trends and periodic functions, which can model many real-world processes.

1. First-order linear difference equations

The simplest linear difference equation is a first-order equation:

\[y_t = k_1y_{t-1}\]

Understanding our system and how it responds to input is not by itself sufficient to map to unique values for any \(t\); we also need some input. The difference equation or recurrence relation models the system dynamics, which then need unique initial conditions to determine a unique time series trajectory.

For instance we can say \(k_1 = 1.1\) and \(y_0 = 1\). Then \(y_1=1.1\), \(y_2 = 1.21\) and so forth. \(y_t\) grows by 10% at each timestep from the previous timestep, and we have exponential growth.1

If \(k_1 = 0.9\), then \(y_1=0.9\), \(y_2 = 0.81\) \(y_t\) and so forth. \(y_t\) declines by 10% at each timestep from the previous timestep, and we have exponential decay.

We can solve a first-degree difference equation more generally. Given \(y_t = k_1y_{t-1}\), and an initial condition \(y_0\), let’s solve for a closed-form expression for \(y_t\), that just depends on \(t\) and \(k_1\) and \(y_0\) and doesn’t depend recursively on prior values of \(y_t\). Based on our exploratory analysis, let’s guess that the solution is \(y_t=a \lambda^ t\) for some \(a\) and some \(\lambda\), and solve for \(a\) and \(\lambda\).

\[y_t=a \lambda ^ t\] \[\iff k_1y_{t-1} = a \lambda ^ t\]

Sub in the \(a \lambda^{t}\) expression for any \(y_t\)s:

\[\iff k_1 a \lambda ^ {t-1} = a \lambda ^ t\]

Divide by \(a\) to get rid of the \(a\)s:

\[\iff k_1 \lambda ^ {t-1} = \lambda ^ t\] \[\iff k_1 = \frac{\lambda ^ t}{\lambda ^ {t-1}} = \lambda\]

That gives us the value of \(\lambda = k_1\). We now have:

\[y_t= a k_1^t\]

To get the value of \(a\), we plug in our known initial condition \(y_0\):

\[y_0 = a \lambda^ 0 = a\]

So \(a = y_0\), \(\lambda = k_1\) and our closed-form solution is:

\[y_t = y_0 k_1^t\]

To repeat what we did:

We have: a difference equation relating \(y_t\) to previous \(y_{t-j}\)s.

We want: A closed-form expression for \(y_t\) as a function of \(t\), without any recurrence relation.

1. Guess the form of a solution: \(y_t = a \lambda ^ t\).

2. Solve for \(\lambda\)(s) after substituting each \(y_{t-j}\) with \(a \lambda ^ {t-j}\).

3. Plug in initial conditions to solve for \(a\).

This may be a painful elaboration of the obvious. But this approach lets us solve arbitrary \(n\)th-order difference equations.

Here are plots for different values of \(k_1\) a/k/a \(\lambda\):

And a table:

\(\lambda\) Regime Response to input noise
𝜆 > 1 Exponential growth  
𝜆 = 1 Neutrally stable at \(y_0\) Unit root, random walk
0 < 𝜆 < 1 Exponential decay  
𝜆 = 0 Stable at 0 Input noise
0 > 𝜆 > -1 Oscillating exponential decay  
𝜆 = -1 Oscillating \(𝑦_0\) to \(−𝑦_0\) Oscillating random walk
𝜆 < -1 Oscillating exponential growth  

2. Second-order linear difference equations

Let’s apply a similar process to 2nd-order difference equations of the form \(y_t = k_1 y_{t-1} + k_2y_{t-2}\). Things are about to get more interesting. Let’s start with a numerical example:

\[y_t = 3 y_{t-1} - 2y_{t-2}\]

1. Guess the form of a solution: \(a \lambda^{t}\).

2. Solve for \(\lambda(s)\):

\[3 y_{t-1} - 2y_{t-2} = a \lambda^{t}\]

Sub in the \(a \lambda^{t}\) expression for any \(y_t\)s:

\[\iff 3 a \lambda^{t-1} - 2 a \lambda^{t-2} = a \lambda^{t}\]

Divide by \(a\) to get rid of the \(a\)s:

\[\iff 3 \lambda^{t-1} - 2 \lambda^{t-2} = \lambda^{t}\]

Divide by \(\lambda^{t-2}\) (the lowest degree \(\lambda\) term, same as multiplying by \(\lambda^{-t+2})\)

\[\iff 3 \lambda^{t-1} \cdot \lambda^{-t+2} - 2 \lambda^{t-2} \cdot \lambda^{-t+2}= \lambda^{t} \cdot \lambda^{-t+2}\] \[\iff 3 \lambda - 2 = \lambda^{2}\] \[\iff -\lambda^{2} + 3 \lambda - 2 = 0\]

The left side is the characteristic polynomial of our 2nd-degree difference equation. We can solve this equation using the quadratic formula:

\[\lambda = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a} = \frac{-3 \pm \sqrt{3^2 -4 \cdot -1 \cdot -2}}{2 \cdot -1} = \frac{-3 \pm \sqrt{1}}{-2} = \{1, 2\}\]

Let’s try a few possible solutions as an exercise:

\(a\lambda^t\) \(y_0\) \(y_1\) Time Series
\(1^t\) 1 1 1,1,1,1,…
\(2 \cdot 1^t\) 2 2 2,2,2,2,…
\(2^t\) 1 2 1,2,4,8,…
\(3 \cdot 2^t\) 3 6 3,6,12,24,…
  2 3 2,3,5,9,…

What’s up with the last line? We can plug in any random initial conditions into the difference equation, they don’t need to follow any \(\lambda^t\) pattern. How do we account for that?

3. Plug in initial conditions to solve for \(a\)s

When we have 2 roots \(\{\lambda_1, \lambda_2\}\), then our final answer can be any linear combination of the roots raised to the power of \(t\):

\[y_t = a_1 \lambda_1^t + a_2 \lambda_2^t\]

where we can solve for \(a_1\) and \(a_2\) using our initial conditions \(y_0\) and \(y_1\). For \(y_0 = 2\) and \(y_1=3\):

\[y_0 = a_1 \lambda_1^0 + a_2 \lambda_2^0 = a_1 + a_2= 2\] \[y_1 = a_1 \lambda_1^1 + a_2 \lambda_2^1 = a_1 + 2 a_2= 3\]

Which we can solve by substitution or using the determinant to get \(a_1 = 1\) and \(a_2 = 1\), and arrive at our final answer for initial conditions \(y_0 = 2\) and \(y_1=3\):

\[y_t = 1^t + 2^t = 2^t +1\]

We did the same thing as before: 1) Guess the form of the solution: \(a \lambda^{t}\); 2) Solve for \(\lambda\).

But a 2nd-order difference equation has a 2nd-degree characteristic polynomial, and 2 roots. We have 2 \(a\)s and 2 initial conditions.

By the principle of superposition, if \(a_1\lambda_1 ^ t\) is a potential solution and \(a_2\lambda_2 ^ t\) is a potential solution, then any linear combination of those polynomials is a potential solution. So we 3) solve numerically for \(a_1\) and \(a_2\) using our initial conditions to create a system of 2 linear equations and 2 unknowns.

As an exercise, try to find the closed form of the Fibonacci sequence, where \(y_0=0\), \(y_1=1\), and \(y_t=y_{t-1} + y_{t-2}\). First, find the roots \(\{\lambda_1, \lambda_2\}\) of the characteristic equation \(-\lambda^2 + \lambda + 1 = 0\). Then find \(a_1\) and \(a_2\) such that \(y_t = a_1 \lambda_1^t + a_2 \lambda_2^t\) satisfies the initial conditions for \(t=0\) and \(t=1\). (The solution is in TSA.ipynb)

3. Higher-order linear difference equations

The method we just elaborated can be generalized to linear difference equations of any order:

  • An \(n\)th-order linear difference equation has \(n\) terms and needs \(n\) initial conditions for a unique solution:
\[y_t = k_1y_{t-1} + k_2y_{t-2} + k_3 y_{t-3} + \dots k_ny_{t-n}\]
  • The characteristic polynomial is an \(n\)th-degree polynomial:
\[-\lambda^n + k_1 \lambda_{n-1} + k_2 \lambda_{n-2} \dots + k_{n-1} \lambda_1 + k_n = 0\]
  • The characteristic polynomial has \(n\) roots (possibly equal, possibly complex) by the fundamental theorem of algebra.

  • The solution to the difference equation is a linear combination of each root raised to the \(t\)th power.

\[y_t = a_1 \lambda_1^t + a_2 \lambda_2^t + \dots + a_n \lambda_n^t\]
  • We solve for the \(a\)s by plugging in the \(n\) initial conditions to form a system of \(n\) linear equations with \(n\) unknowns.
\[y_0 = a_1 \lambda_1^0 + a_2 \lambda_2^0 + \dots + a_n \lambda_n^0\] \[y_1 = a_1 \lambda_1^1 + a_2 \lambda_2^1 + \dots + a_n \lambda_n^1\] \[.\] \[.\] \[y_{n-1} = a_1 \lambda_1^{n-1} + a_2 \lambda_2^{n-1} + \dots + a_{n-1} \lambda_n^{n-1}\]

4. Complex roots

You may be asking yourself, an \(n\)th-degree polynomial can have \(n\) real roots, but if it has fewer real roots, say \(m\) real roots, then it has \(n-m\) complex roots. What happens if some of the roots are complex?

Consider this difference equation:

\[y_t = 1.95 y_{t-1} - y_{t-2} =0\] \[y_0 = 1\] \[y_1 = 1.95\]

The characteristic equation is \(-\lambda^2 + 1.95 \lambda - 1 = 0\).

The roots are:

\[\lambda = \frac{-1.95 \pm \sqrt{1.95^2 -4}}{-2}\]

We can see that the part under the square root symbol is negative, so the roots of the polynomial are complex. Let’s plot the evolution of the time series.