Box-Cox Normality Plot
NIST/SEMATECH Section 1.3.3.6 Box-Cox Normality Plot
What It Is
A Box-Cox normality plot identifies the optimal power transformation to make a dataset approximately normally distributed. It evaluates the normality of the transformed data across a range of lambda values and selects the one that yields the best fit to a normal distribution.
The procedure transforms the data as for a range of values and evaluates the normality of each transformed dataset using the probability plot correlation coefficient (PPCC). The optimal maximizes the PPCC. The case uses by convention. The plot typically includes a confidence interval around the peak to indicate the range of values that produce comparable normality. If the interval includes , no transformation is necessary. Common special cases: (square root), (log), (reciprocal).
Questions This Plot Answers
- Is there a transformation that will normalize my data?
- What is the optimal value of the transformation parameter?
Why It Matters
Many statistical procedures (t-tests, ANOVA, capability indices) assume normally distributed data. When the raw data are skewed, the Box-Cox normality plot identifies the power transformation that best achieves normality, providing a systematic, data-driven alternative to ad hoc log or square root transforms.
When to Use a Box-Cox Normality Plot
Use a Box-Cox normality plot when the data are skewed or otherwise non-normal and normality is required for downstream statistical analysis, such as t-tests, ANOVA, or control chart construction. The method automates the search for an appropriate power transformation, saving the analyst from trial-and-error experimentation with logs, square roots, and reciprocals. It is especially common in process capability studies where normality is a prerequisite for calculating capability indices.
How to Interpret a Box-Cox Normality Plot
The horizontal axis shows the range of values tested, typically from to , and the vertical axis shows the corresponding normality measure — often the probability plot correlation coefficient (PPCC) or the Shapiro-Wilk statistic. The value of that maximizes the normality measure is the optimal transformation. If the optimal is near , no transformation is needed. Common interpretable values include (square root), (log), and (reciprocal). If the peak is broad, several transformations yield comparably normal results, and the simplest interpretable value should be chosen. A sharply peaked curve indicates that the normality of the transformed data is highly sensitive to the choice of .
Assumptions and Limitations
The data must be strictly positive for the Box-Cox family of transformations to be applied over the full range of . Outliers can strongly influence the optimal and should be investigated before relying on the transformation. The Box-Cox approach finds the best power transformation for normality but cannot make fundamentally multi-modal data normal, since no power transformation can split a bimodal distribution into a unimodal one.
Reference: NIST/SEMATECH e-Handbook of Statistical Methods, Section 1.3.3.6
Formulas
Box-Cox Transformation for Normality
The Box-Cox power transformation applied to the response variable Y. The optimal lambda is chosen by maximizing the probability plot correlation coefficient (PPCC) of the transformed data against normal order statistics.
Python Example
import numpy as npimport matplotlib.pyplot as pltfrom scipy import stats
# Generate right-skewed sample datarng = np.random.default_rng(42)data = rng.lognormal(mean=2, sigma=0.8, size=200)
# Box-Cox normality plotfig, ax = plt.subplots(figsize=(10, 5))lmbdas, ppcc = stats.boxcox_normplot(data, la=-2, lb=2, plot=ax)
# Find and mark the optimal lambdaoptimal_lambda = lmbdas[np.argmax(ppcc)]ax.axvline(optimal_lambda, color='r', linestyle='--', label=f'Optimal lambda = {optimal_lambda:.2f}')ax.set_title("Box-Cox Normality Plot")ax.legend()plt.tight_layout()plt.show()