Skip to main content

Autocorrelation

NIST/SEMATECH Section 1.3.5.12 Autocorrelation

What It Is

The autocorrelation coefficient measures the linear dependence between observations separated by a fixed number of time steps (the lag). It is the correlation of a time series with a lagged version of itself, normalized to lie between -1 and +1.

When to Use It

Use autocorrelation to test whether successive measurements are statistically independent or exhibit serial correlation. It is fundamental for validating the independence assumption underlying many statistical procedures, including control charts, regression, and hypothesis tests. Significant autocorrelation at early lags indicates that the process has memory, which must be accounted for in any subsequent analysis.

How to Interpret

Plot the autocorrelation coefficients as a function of lag to create a correlogram. If the data are truly random, all autocorrelations should fall within the significance bounds (plus or minus 1.96/sqrt(n)). A slowly decaying autocorrelation function suggests a trend or drift in the data. A single spike at lag 1 followed by near-zero values suggests a first-order autoregressive process. Periodic spikes (e.g., at lags 12, 24, ...) indicate seasonality. The pattern of autocorrelations guides the selection of appropriate time-series models such as AR, MA, or ARMA processes.

Assumptions and Limitations

The autocorrelation coefficient assumes the time series is weakly stationary, meaning that its mean and variance do not change over time. The significance bounds assume the null hypothesis of independent, identically distributed observations. For non-stationary data, differencing or detrending should be applied before computing autocorrelations.

Reference: NIST/SEMATECH e-Handbook, Section 1.3.5.12

Formulas

Autocorrelation Coefficient at Lag k

rk=i=1nk(xixˉ)(xi+kxˉ)i=1n(xixˉ)2r_k = \frac{\sum_{i=1}^{n-k}(x_i - \bar{x})(x_{i+k} - \bar{x})}{\sum_{i=1}^{n}(x_i - \bar{x})^2}

The sample autocorrelation at lag k, normalized by the total sum of squared deviations so that r_0 = 1.

Approximate 95% Significance Bounds

±1.96n\pm \frac{1.96}{\sqrt{n}}

Under the null hypothesis of white noise, the autocorrelation at any lag is approximately normally distributed with standard error 1/sqrt(n).