Skip to main content

Box-Cox Normality Plot

NIST/SEMATECH Section 1.3.3.6 Box-Cox Normality Plot

5 10 15 20 25 30 Value 0 100 200 300 400 Frequency Original Data (Right-Skewed) -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 Lambda 0.4 0.5 0.6 0.7 0.8 0.9 1 PPCC Box-Cox Normality (λ = -0.2) -3 -2 -1 0 1 2 T(Y), λ=-0.2 0 20 40 60 80 100 Frequency Transformed Data (T(Y), λ=-0.2) -3 -2 -1 0 1 2 3 Theoretical Quantiles -3 -2 -1 0 1 2 T(Y), λ=-0.2 Normal Prob. Plot (Transformed)
A Box-Cox normality plot identifies the optimal power transformation to make a dataset approximately normally distributed. It evaluates the normality of the transformed data across a range of lambda values and selects the one that yields the best fit to a normal distribution.

What It Is

A Box-Cox normality plot identifies the optimal power transformation to make a dataset approximately normally distributed. It evaluates the normality of the transformed data across a range of lambda values and selects the one that yields the best fit to a normal distribution.

The procedure transforms the data as YλY^{\lambda} for a range of λ\lambda values and evaluates the normality of each transformed dataset using the probability plot correlation coefficient (PPCC). The optimal λ\lambda maximizes the PPCC. The λ=0\lambda = 0 case uses ln(Y)\ln(Y) by convention. The plot typically includes a confidence interval around the peak to indicate the range of λ\lambda values that produce comparable normality. If the interval includes λ=1\lambda = 1, no transformation is necessary. Common special cases: λ=0.5\lambda = 0.5 (square root), 00 (log), 1-1 (reciprocal).

Questions This Plot Answers

  • Is there a transformation that will normalize my data?
  • What is the optimal value of the transformation parameter?

Why It Matters

Many statistical procedures (t-tests, ANOVA, capability indices) assume normally distributed data. When the raw data are skewed, the Box-Cox normality plot identifies the power transformation that best achieves normality, providing a systematic, data-driven alternative to ad hoc log or square root transforms.

When to Use a Box-Cox Normality Plot

Use a Box-Cox normality plot when the data are skewed or otherwise non-normal and normality is required for downstream statistical analysis, such as t-tests, ANOVA, or control chart construction. The method automates the search for an appropriate power transformation, saving the analyst from trial-and-error experimentation with logs, square roots, and reciprocals. It is especially common in process capability studies where normality is a prerequisite for calculating capability indices.

How to Interpret a Box-Cox Normality Plot

The horizontal axis shows the range of λ\lambda values tested, typically from 2-2 to +2+2, and the vertical axis shows the corresponding normality measure — often the probability plot correlation coefficient (PPCC) or the Shapiro-Wilk statistic. The value of λ\lambda that maximizes the normality measure is the optimal transformation. If the optimal λ\lambda is near 11, no transformation is needed. Common interpretable values include λ=0.5\lambda = 0.5 (square root), λ=0\lambda = 0 (log), and λ=1\lambda = -1 (reciprocal). If the peak is broad, several transformations yield comparably normal results, and the simplest interpretable value should be chosen. A sharply peaked curve indicates that the normality of the transformed data is highly sensitive to the choice of λ\lambda.

Assumptions and Limitations

The data must be strictly positive for the Box-Cox family of transformations to be applied over the full range of λ\lambda. Outliers can strongly influence the optimal λ\lambda and should be investigated before relying on the transformation. The Box-Cox approach finds the best power transformation for normality but cannot make fundamentally multi-modal data normal, since no power transformation can split a bimodal distribution into a unimodal one.

Reference: NIST/SEMATECH e-Handbook of Statistical Methods, Section 1.3.3.6

Formulas

Box-Cox Transformation for Normality

T(Y)={Yλ1λλ0ln(Y)λ=0T(Y) = \begin{cases} \dfrac{Y^{\lambda} - 1}{\lambda} & \lambda \neq 0 \\[6pt] \ln(Y) & \lambda = 0 \end{cases}

The Box-Cox power transformation applied to the response variable Y. The optimal lambda is chosen by maximizing the probability plot correlation coefficient (PPCC) of the transformed data against normal order statistics.

Python Example

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
# Generate right-skewed sample data
rng = np.random.default_rng(42)
data = rng.lognormal(mean=2, sigma=0.8, size=200)
# Box-Cox normality plot
fig, ax = plt.subplots(figsize=(10, 5))
lmbdas, ppcc = stats.boxcox_normplot(data, la=-2, lb=2, plot=ax)
# Find and mark the optimal lambda
optimal_lambda = lmbdas[np.argmax(ppcc)]
ax.axvline(optimal_lambda, color='r', linestyle='--',
label=f'Optimal lambda = {optimal_lambda:.2f}')
ax.set_title("Box-Cox Normality Plot")
ax.legend()
plt.tight_layout()
plt.show()