Skip to main content

Lag Plot

NIST/SEMATECH Section 1.3.3.15 Lag Plot

-3 -2 -1 0 1 2 3 Y(i-1) -3 -2 -1 0 1 2 3 Y(i) Random Data

What It Is

A lag plot is a scatter plot that checks whether a dataset or time series is random. For a lag of k, each point is plotted as (YikY_{i-k}, YiY_i), where the horizontal axis shows the lagged value and the vertical axis shows the current value. The most commonly used lag is 1. Random data should not exhibit any identifiable structure; non-random structure indicates that the underlying data are not random.

The horizontal axis shows Yi1Y_{i-1} and the vertical axis shows YiY_i for lag 1. Each point represents two successive observations. The plot exploits the human eye's pattern recognition: random data produce a structureless "shotgun" cloud, while any departure from randomness produces a recognizable geometric shape. Since the axes are exactly Yi1Y_{i-1} and YiY_i, an autoregressive fit can be performed as a linear regression directly from the lag plot.

Questions This Plot Answers

  • Are the data random?
  • Is there serial correlation in the data?
  • What is a suitable model for the data?
  • Are there outliers in the data?

Why It Matters

The lag plot provides an instant visual diagnostic for randomness that requires no distributional assumptions. A single glance reveals whether successive observations are independent, which is critical because non-independence invalidates standard error formulas, makes confidence intervals too narrow, and causes control charts to generate false alarms. When the lag plot reveals autocorrelation, the next step is to estimate the parameters for an autoregressive model Yi=A0+A1Yi1+EiY_i = A_0 + A_1 Y_{i-1} + E_i using linear regression directly from the lag plot axes.

When to Use a Lag Plot

Use a lag plot as a simple and powerful diagnostic for detecting non-randomness in time series data. It complements the autocorrelation plot by providing a direct visual impression of how successive observations are related. The lag plot is particularly useful for detecting non-linear dependencies that might not be captured by linear autocorrelation, such as clustered or oscillating patterns. Inasmuch as randomness is an underlying assumption for most statistical estimation and testing techniques, the lag plot should be a routine tool for researchers.

How to Interpret a Lag Plot

For random data, the lag plot appears as a structureless cloud of points with no discernible pattern, indicating that knowing the current value Yi1Y_{i-1} provides no information about the next value YiY_i. A strong positive linear pattern along the diagonal indicates positive autocorrelation, meaning high values tend to follow high values and low values follow low values. Moderate autocorrelation produces a noisy elliptical cluster along the diagonal with a restricted but still wide range of possible next values. Strong autocorrelation produces a tight linear band along the diagonal, making prediction possible from one observation to the next. A tight elliptical or circular loop pattern indicates a sinusoidal model. The lag plot is also valuable for outlier detection: each data point appears twice in a lag-1 plot (once as YiY_i and once as Yi1Y_{i-1}), so apparent outliers in the lag plot can be traced back to specific data points in the original sequence.

Examples

Random Data

A structureless, circular cloud of points centered on the plot with no discernible pattern — a "shotgun" pattern. One cannot infer from a current value Yi1Y_{i-1} the next value YiY_i. For example, a given value on the horizontal axis corresponds to virtually any value on the vertical axis. Such non-association is the essence of randomness.

Moderate Autocorrelation

Points cluster noisily along the diagonal, forming an elongated ellipse. This is the lag plot signature of moderate positive autocorrelation. For a known current value Yi1Y_{i-1}, the range of possible next values YiY_i is restricted but still broad, suggesting that prediction via an autoregressive model is possible but imprecise.

Strong Autocorrelation

Points form a tight linear band along the diagonal, indicating strong positive autocorrelation. If you know Yi1Y_{i-1} you can make a strong guess as to what YiY_i will be. The recommended next step is to fit the autoregressive model Yi=A0+A1Yi1+EiY_i = A_0 + A_1 Y_{i-1} + E_i using linear regression directly from the lag plot.

Sinusoidal

Points form a tight elliptical loop, the lag plot signature of a single-cycle sinusoidal model. The lag plot also reveals outliers: points lying off the ellipse indicate suspect data values. Each raw data point appears twice in the lag plot (once as YiY_i and once as Yi1Y_{i-1}), so apparent outlier pairs can be traced back to a single faulty observation.

Assumptions and Limitations

The lag plot requires that the data be recorded in the order of collection. It is most effective at lag 1 for detecting first-order dependence, though lag plots can be generated for any arbitrary lag to explore more complex patterns. The visual assessment is qualitative and should be supported by quantitative tests such as the autocorrelation coefficient or the runs test.

See It In Action

This technique is demonstrated in the following case studies:

Reference: NIST/SEMATECH e-Handbook of Statistical Methods, Section 1.3.3.15

Formulas

Autoregressive Model

Yi=A0+A1Yi1+EiY_i = A_0 + A_1\, Y_{i-1} + E_i

When the lag plot reveals autocorrelation, this model is fit using linear regression directly from the lag plot axes. The residual standard deviation will be much smaller than for the default constant model.

Default (Constant) Model

Yi=A0+EiY_i = A_0 + E_i

The null model assumes each observation is a constant plus random error. When autocorrelation is present, this model is inadequate and the autoregressive model above should be used.

Sinusoidal Model

Yi=C+αsin(2πωti+ϕ)+EiY_i = C + \alpha\sin(2\pi\omega\, t_i + \phi) + E_i

When the lag plot shows an elliptical loop, a sinusoidal model is appropriate. Alpha is the amplitude, omega is the frequency (between 0 and 0.5 cycles per observation), and phi is the phase.

Python Example

import numpy as np
import matplotlib.pyplot as plt
# Generate AR(1) data to show serial dependence
rng = np.random.default_rng(42)
n = 200
y = np.zeros(n)
for t in range(1, n):
y[t] = 0.7 * y[t - 1] + rng.standard_normal()
lag = 1
fig, ax = plt.subplots(figsize=(6, 6))
ax.scatter(y[:-lag], y[lag:], alpha=0.5, edgecolors="k", linewidths=0.5)
ax.set_xlabel(r"$Y_{i-1}$")
ax.set_ylabel(r"$Y_i$")
ax.set_title("Lag Plot (lag = 1)")
ax.set_aspect("equal")
plt.tight_layout()
plt.show()