Autocorrelation Plot
NIST/SEMATECH Section 1.3.3.1 Autocorrelation Plot
What It Is
An autocorrelation plot displays the sample autocorrelation function of a dataset as a function of the lag. Each vertical bar in the plot represents the correlation between pairs of observations separated by that lag interval, providing a compact view of serial dependence across all relevant time offsets.
The vertical axis shows the autocorrelation coefficient, which normalizes the autocovariance at each lag by the variance. Horizontal reference lines mark the 95% significance bounds under the null hypothesis of white noise. The autocorrelation at lag 0 is always 1 by definition. See the formulas below for the precise definitions.
Questions This Plot Answers
- Are the data random?
- Is an observation related to adjacent observations?
- Is the time series white noise, sinusoidal, or autoregressive?
- What model best fits the observed time series?
- Is the simple constant-plus-error model valid?
- Is the standard error formula for sample means applicable?
Why It Matters
Randomness is one of the critical assumptions underlying the validity of engineering conclusions drawn from data. Most standard statistical tests depend on it, and common formulas such as the standard error of the mean become unreliable when serial dependence is present. Without randomness, the default univariate model breaks down, making parameter estimates and derived confidence intervals, control charts, and capability indices suspect. The autocorrelation plot is the primary tool for detecting this violation before flawed statistics lead to unsound engineering decisions.
When to Use a Autocorrelation Plot
Use an autocorrelation plot when analyzing time-ordered data to determine whether successive observations are statistically independent. It is essential for validating the randomness assumption underlying many statistical procedures, including control charts, capability studies, and regression models. The plot is also used in the model identification stage for Box-Jenkins time series modeling. After fitting a model, the plot helps verify that residuals behave as white noise rather than retaining unexplained temporal structure.
How to Interpret a Autocorrelation Plot
The horizontal axis shows the lag value and the vertical axis shows the autocorrelation coefficient, which ranges from to . A pair of horizontal reference lines, typically drawn at , marks the 95 percent significance bounds. For truly random data, approximately 95 percent of the autocorrelation values should fall within these bounds. The key diagnostic is the decay pattern of the autocorrelation function. Strong autocorrelation shows a lag-1 value near 1.0 with a smooth, nearly linear decline that crosses zero into negative values — the decreasing autocorrelation has little noise and provides high predictability. Moderate autocorrelation shows a lag-1 value around 0.75 with a gradual but noisier decline that reaches the significance bounds sooner. Both patterns indicate an autoregressive process where the recommended model is . A sinusoidal pattern — alternating positive and negative spikes that do not decay toward zero — indicates periodic behavior in the data.
Examples
White Noise
All autocorrelation values fall within the 95% confidence bands, with no significant spikes at any lag. This is the signature of purely random data where each observation is independent of all others and there is no ability to predict the next value from the current one. With a 95% confidence interval, roughly one in twenty lags may still fall outside the bounds due to random fluctuations alone, so an isolated spike is not cause for concern. The constant-plus-error model is appropriate.
Strong Autocorrelation
The autocorrelation at lag 1 is high (only slightly less than 1) and declines slowly in a smooth, nearly linear fashion with little noise. The decay continues past zero into negative autocorrelation at higher lags. This pattern is the autocorrelation plot signature of strong autocorrelation, which provides high predictability if modeled properly. The recommended next step is to fit the autoregressive model using least squares or Box-Jenkins methods, then verify that the residuals are random. The residual standard deviation for this autoregressive model will be much smaller than the residual standard deviation for the default constant model.
Moderate Autocorrelation
The autocorrelation at lag 1 is moderately high (approximately 0.75) and decreases gradually over successive lags. The decay is generally linear but with significant noise — the bars are visibly jagged compared to the smooth decline of strong autocorrelation. This pattern indicates data with moderate serial dependence, providing moderate predictability if modeled properly. The same autoregressive model applies, but the initial spike is smaller and the decay is faster, reaching the significance bounds sooner than in the strong case.
Sinusoidal Model
Sinusoidal models produce oscillating autocorrelation values that repeat at regular lag intervals. The alternating positive and negative spikes fail to decay toward zero. This non-decaying behavior is the key signature that distinguishes periodic data from an autoregressive process, where oscillations would diminish over successive lags. The spacing of the peaks reveals the period, and the amplitude of the oscillation indicates the strength of the cyclic component.
Assumptions and Limitations
The autocorrelation plot assumes the data are uniformly spaced in time or sequence order. It requires a reasonably large sample size, typically at least 50 observations, to produce reliable estimates at moderate lags. The significance bounds assume normality and independence under the null hypothesis, so they serve as approximate guides rather than exact critical values. Note that autocorrelation measures only linear dependence; data with zero autocorrelation at all lags may still be non-random (e.g., non-linear dependence).
See It In Action
This technique is demonstrated in the following case studies:
Reference: NIST/SEMATECH e-Handbook of Statistical Methods, Section 1.3.3.1
Formulas
Autocorrelation Coefficient
The autocorrelation at lag h is the autocovariance at lag h divided by the variance (autocovariance at lag 0), giving a value between -1 and +1.
Lag-Zero Identity
The autocorrelation at lag 0 is always 1 by definition, since the autocovariance at lag 0 equals the variance.
Autocovariance Function
The autocovariance at lag h measures the average product of deviations from the mean for observations separated by h time steps.
95% Significance Bounds
Under the null hypothesis of white noise, approximately 95% of autocorrelation values should fall within these bounds. Uses 1.96/sqrt(N) as the exact 95% critical value.
Python Example
import numpy as npimport matplotlib.pyplot as pltfrom statsmodels.graphics.tsaplots import plot_acf
# Generate AR(1) data: Y_t = 0.7 * Y_{t-1} + e_trng = np.random.default_rng(42)n = 200y = np.zeros(n)for t in range(1, n): y[t] = 0.7 * y[t - 1] + rng.standard_normal()
fig, ax = plt.subplots(figsize=(8, 4))plot_acf(y, lags=40, ax=ax, title="Autocorrelation Plot (AR(1) Process)")ax.set_xlabel("Lag")ax.set_ylabel("Autocorrelation")plt.tight_layout()plt.show()