Lag Plot
NIST/SEMATECH Section 1.3.3.15 Lag Plot
What It Is
A lag plot is a scatter plot that checks whether a dataset or time series is random. For a lag of k, each point is plotted as (, ), where the horizontal axis shows the lagged value and the vertical axis shows the current value. The most commonly used lag is 1. Random data should not exhibit any identifiable structure; non-random structure indicates that the underlying data are not random.
The horizontal axis shows and the vertical axis shows for lag 1. Each point represents two successive observations. The plot exploits the human eye's pattern recognition: random data produce a structureless "shotgun" cloud, while any departure from randomness produces a recognizable geometric shape. Since the axes are exactly and , an autoregressive fit can be performed as a linear regression directly from the lag plot.
Questions This Plot Answers
- Are the data random?
- Is there serial correlation in the data?
- What is a suitable model for the data?
- Are there outliers in the data?
Why It Matters
The lag plot provides an instant visual diagnostic for randomness that requires no distributional assumptions. A single glance reveals whether successive observations are independent, which is critical because non-independence invalidates standard error formulas, makes confidence intervals too narrow, and causes control charts to generate false alarms. When the lag plot reveals autocorrelation, the next step is to estimate the parameters for an autoregressive model using linear regression directly from the lag plot axes.
When to Use a Lag Plot
Use a lag plot as a simple and powerful diagnostic for detecting non-randomness in time series data. It complements the autocorrelation plot by providing a direct visual impression of how successive observations are related. The lag plot is particularly useful for detecting non-linear dependencies that might not be captured by linear autocorrelation, such as clustered or oscillating patterns. Inasmuch as randomness is an underlying assumption for most statistical estimation and testing techniques, the lag plot should be a routine tool for researchers.
How to Interpret a Lag Plot
For random data, the lag plot appears as a structureless cloud of points with no discernible pattern, indicating that knowing the current value provides no information about the next value . A strong positive linear pattern along the diagonal indicates positive autocorrelation, meaning high values tend to follow high values and low values follow low values. Moderate autocorrelation produces a noisy elliptical cluster along the diagonal with a restricted but still wide range of possible next values. Strong autocorrelation produces a tight linear band along the diagonal, making prediction possible from one observation to the next. A tight elliptical or circular loop pattern indicates a sinusoidal model. The lag plot is also valuable for outlier detection: each data point appears twice in a lag-1 plot (once as and once as ), so apparent outliers in the lag plot can be traced back to specific data points in the original sequence.
Examples
Random Data
A structureless, circular cloud of points centered on the plot with no discernible pattern — a "shotgun" pattern. One cannot infer from a current value the next value . For example, a given value on the horizontal axis corresponds to virtually any value on the vertical axis. Such non-association is the essence of randomness.
Moderate Autocorrelation
Points cluster noisily along the diagonal, forming an elongated ellipse. This is the lag plot signature of moderate positive autocorrelation. For a known current value , the range of possible next values is restricted but still broad, suggesting that prediction via an autoregressive model is possible but imprecise.
Strong Autocorrelation
Points form a tight linear band along the diagonal, indicating strong positive autocorrelation. If you know you can make a strong guess as to what will be. The recommended next step is to fit the autoregressive model using linear regression directly from the lag plot.
Sinusoidal
Points form a tight elliptical loop, the lag plot signature of a single-cycle sinusoidal model. The lag plot also reveals outliers: points lying off the ellipse indicate suspect data values. Each raw data point appears twice in the lag plot (once as and once as ), so apparent outlier pairs can be traced back to a single faulty observation.
Assumptions and Limitations
The lag plot requires that the data be recorded in the order of collection. It is most effective at lag 1 for detecting first-order dependence, though lag plots can be generated for any arbitrary lag to explore more complex patterns. The visual assessment is qualitative and should be supported by quantitative tests such as the autocorrelation coefficient or the runs test.
See It In Action
This technique is demonstrated in the following case studies:
Reference: NIST/SEMATECH e-Handbook of Statistical Methods, Section 1.3.3.15
Formulas
Autoregressive Model
When the lag plot reveals autocorrelation, this model is fit using linear regression directly from the lag plot axes. The residual standard deviation will be much smaller than for the default constant model.
Default (Constant) Model
The null model assumes each observation is a constant plus random error. When autocorrelation is present, this model is inadequate and the autoregressive model above should be used.
Sinusoidal Model
When the lag plot shows an elliptical loop, a sinusoidal model is appropriate. Alpha is the amplitude, omega is the frequency (between 0 and 0.5 cycles per observation), and phi is the phase.
Python Example
import numpy as npimport matplotlib.pyplot as plt
# Generate AR(1) data to show serial dependencerng = np.random.default_rng(42)n = 200y = np.zeros(n)for t in range(1, n): y[t] = 0.7 * y[t - 1] + rng.standard_normal()
lag = 1fig, ax = plt.subplots(figsize=(6, 6))ax.scatter(y[:-lag], y[lag:], alpha=0.5, edgecolors="k", linewidths=0.5)ax.set_xlabel(r"$Y_{i-1}$")ax.set_ylabel(r"$Y_i$")ax.set_title("Lag Plot (lag = 1)")ax.set_aspect("equal")plt.tight_layout()plt.show()