Autocorrelation Plot

NIST/SEMATECH Section 1.3.3.1 Autocorrelation Plot

What It Is

An autocorrelation plot displays the sample autocorrelation function of a dataset as a function of the lag. Each vertical bar in the plot represents the correlation between pairs of observations separated by that lag interval, providing a compact view of serial dependence across all relevant time offsets.

The vertical axis shows the autocorrelation coefficient, which normalizes the autocovariance at each lag by the variance. Horizontal reference lines mark the 95% significance bounds under the null hypothesis of white noise. The autocorrelation at lag 0 is always 1 by definition. See the formulas below for the precise definitions.

Questions This Plot Answers

Are the data random?
Is an observation related to adjacent observations?
Is the time series white noise, sinusoidal, or autoregressive?
What model best fits the observed time series?
Is the simple constant-plus-error model valid?
Is the standard error formula for sample means applicable?

Why It Matters

Randomness is one of the critical assumptions underlying the validity of engineering conclusions drawn from data. Most standard statistical tests depend on it, and common formulas such as the standard error of the mean $s/\sqrt{N}$ become unreliable when serial dependence is present. Without randomness, the default univariate model $Y = \text{constant} + \text{error}$ breaks down, making parameter estimates and derived confidence intervals, control charts, and capability indices suspect. The autocorrelation plot is the primary tool for detecting this violation before flawed statistics lead to unsound engineering decisions.

When to Use a Autocorrelation Plot

Use an autocorrelation plot when analyzing time-ordered data to determine whether successive observations are statistically independent. It is essential for validating the randomness assumption underlying many statistical procedures, including control charts, capability studies, and regression models. The plot is also used in the model identification stage for Box-Jenkins time series modeling. After fitting a model, the plot helps verify that residuals behave as white noise rather than retaining unexplained temporal structure.

How to Interpret a Autocorrelation Plot

The horizontal axis shows the lag value and the vertical axis shows the autocorrelation coefficient, which ranges from $-1$ to $+1$ . A pair of horizontal reference lines, typically drawn at $\pm\,1.96/\sqrt{N}$ , marks the 95 percent significance bounds. For truly random data, approximately 95 percent of the autocorrelation values should fall within these bounds. The key diagnostic is the decay pattern of the autocorrelation function. Strong autocorrelation shows a lag-1 value near 1.0 with a smooth, nearly linear decline that crosses zero into negative values — the decreasing autocorrelation has little noise and provides high predictability. Moderate autocorrelation shows a lag-1 value around 0.75 with a gradual but noisier decline that reaches the significance bounds sooner. Both patterns indicate an autoregressive process where the recommended model is $Y_i = A_0 + A_1 Y_{i-1} + E_i$ . A sinusoidal pattern — alternating positive and negative spikes that do not decay toward zero — indicates periodic behavior in the data.

Examples

White Noise

All autocorrelation values fall within the 95% confidence bands, with no significant spikes at any lag. This is the signature of purely random data where each observation is independent of all others and there is no ability to predict the next value from the current one. With a 95% confidence interval, roughly one in twenty lags may still fall outside the bounds due to random fluctuations alone, so an isolated spike is not cause for concern. The constant-plus-error model $Y = c + e$ is appropriate.

Strong Autocorrelation

The autocorrelation at lag 1 is high (only slightly less than 1) and declines slowly in a smooth, nearly linear fashion with little noise. The decay continues past zero into negative autocorrelation at higher lags. This pattern is the autocorrelation plot signature of strong autocorrelation, which provides high predictability if modeled properly. The recommended next step is to fit the autoregressive model $Y_i = A_0 + A_1 \cdot Y_{i-1} + E_i$ using least squares or Box-Jenkins methods, then verify that the residuals are random. The residual standard deviation for this autoregressive model will be much smaller than the residual standard deviation for the default constant model.

Moderate Autocorrelation

The autocorrelation at lag 1 is moderately high (approximately 0.75) and decreases gradually over successive lags. The decay is generally linear but with significant noise — the bars are visibly jagged compared to the smooth decline of strong autocorrelation. This pattern indicates data with moderate serial dependence, providing moderate predictability if modeled properly. The same autoregressive model $Y_i = A_0 + A_1 \cdot Y_{i-1} + E_i$ applies, but the initial spike is smaller and the decay is faster, reaching the significance bounds sooner than in the strong case.

Sinusoidal Model

Sinusoidal models produce oscillating autocorrelation values that repeat at regular lag intervals. The alternating positive and negative spikes fail to decay toward zero. This non-decaying behavior is the key signature that distinguishes periodic data from an autoregressive process, where oscillations would diminish over successive lags. The spacing of the peaks reveals the period, and the amplitude of the oscillation indicates the strength of the cyclic component.

Assumptions and Limitations

The autocorrelation plot assumes the data are uniformly spaced in time or sequence order. It requires a reasonably large sample size, typically at least 50 observations, to produce reliable estimates at moderate lags. The significance bounds assume normality and independence under the null hypothesis, so they serve as approximate guides rather than exact critical values. Note that autocorrelation measures only linear dependence; data with zero autocorrelation at all lags may still be non-random (e.g., non-linear dependence).

See It In Action

This technique is demonstrated in the following case studies:

Beam Deflections Case Study

Reference: NIST/SEMATECH e-Handbook of Statistical Methods, Section 1.3.3.1

Formulas

Autocorrelation Coefficient

R_h = \frac{C_h}{C_0}

The autocorrelation at lag h is the autocovariance at lag h divided by the variance (autocovariance at lag 0), giving a value between -1 and +1.

Lag-Zero Identity

R_0 = \frac{C_0}{C_0} = 1

The autocorrelation at lag 0 is always 1 by definition, since the autocovariance at lag 0 equals the variance.

Autocovariance Function

C_h = \frac{1}{N}\sum_{t=1}^{N-h}(Y_t - \bar{Y})(Y_{t+h} - \bar{Y})

The autocovariance at lag h measures the average product of deviations from the mean for observations separated by h time steps.

95% Significance Bounds

\pm\,\frac{1.96}{\sqrt{N}}

Under the null hypothesis of white noise, approximately 95% of autocorrelation values should fall within these bounds. Uses 1.96/sqrt(N) as the exact 95% critical value.

Python Example

import numpy as np
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf

# Generate AR(1) data: Y_t = 0.7 * Y_{t-1} + e_t
rng = np.random.default_rng(42)
n = 200
y = np.zeros(n)
for t in range(1, n):
    y[t] = 0.7 * y[t - 1] + rng.standard_normal()

fig, ax = plt.subplots(figsize=(8, 4))
plot_acf(y, lags=40, ax=ax, title="Autocorrelation Plot (AR(1) Process)")
ax.set_xlabel("Lag")
ax.set_ylabel("Autocorrelation")
plt.tight_layout()
plt.show()