Skip to main content

Box-Cox Linearity Plot

NIST/SEMATECH Section 1.3.3.5 Box-Cox Linearity Plot

R² = 0.9425 0 5 10 15 20 25 30 35 40 X 992 994 996 998 1000 Y Original Data (Y vs X) -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 Lambda -1 -0.9 -0.8 -0.7 -0.6 -0.5 Correlation Box-Cox Linearity (λ = 2.0) R² = 0.9996 0 100 200 300 400 500 600 700 800 T(X), λ=2.0 992 994 996 998 1000 Y Transformed (Y vs X^2.0)
A Box-Cox linearity plot helps identify the optimal power transformation of the predictor (X) variable that maximizes the linear correlation between a response Y and a predictor X. It evaluates a range of power transformation exponents and displays the correlation coefficient as a function of the transformation parameter lambda.

What It Is

A Box-Cox linearity plot helps identify the optimal power transformation of the predictor (X) variable that maximizes the linear correlation between a response Y and a predictor X. It evaluates a range of power transformation exponents and displays the correlation coefficient as a function of the transformation parameter lambda.

The procedure evaluates XλX^{\lambda} for a range of λ\lambda values (typically 2-2 to +2+2) and computes the Pearson correlation between YY and XλX^{\lambda} at each λ\lambda. The λ=0\lambda = 0 case uses ln(X)\ln(X) by convention. The plot displays correlation vs. λ\lambda, and the peak identifies the optimal transformation. Common special cases: λ=1\lambda = 1 (no transform), 0.50.5 (square root), 00 (log), 1-1 (reciprocal).

Questions This Plot Answers

  • Would a suitable transformation improve my linear fit?
  • What is the optimal value of the transformation parameter?

Why It Matters

Many bivariate relationships are non-linear in their original scale but become linear after a power transformation of the predictor. The Box-Cox linearity plot automates the search for this transformation, turning a difficult non-linear modeling problem into a simple linear regression. Note that the Box-Cox transformation can also be applied to the response variable Y to satisfy error assumptions such as normality and constant variance; that usage is covered by the Box-Cox normality plot.

When to Use a Box-Cox Linearity Plot

Use a Box-Cox linearity plot when a scatter plot suggests a curvilinear relationship between a predictor and a response and a linear model is desired. The technique finds the value of lambda that maximizes the correlation between the response and the transformed predictor, effectively straightening the relationship. This is particularly useful in regression analysis when the analyst wants to apply a simple linear model but the raw data violate the linearity assumption.

How to Interpret a Box-Cox Linearity Plot

The horizontal axis shows the range of λ\lambda values tested, typically from 2-2 to +2+2, and the vertical axis shows the corresponding correlation coefficient between YY and the transformed XX. The peak of the curve identifies the optimal λ\lambda value. Common interpretable values include λ=1\lambda = 1 (no transformation needed), 0.50.5 (square root), 00 (log transform by convention), and 1-1 (reciprocal). If the curve is relatively flat near the peak, multiple transformations give similar results and the analyst may choose the most interpretable one. A sharply peaked curve indicates that the linearity of the relationship is highly sensitive to the choice of transformation.

Assumptions and Limitations

The Box-Cox linearity plot requires that the predictor values be strictly positive for most values of λ\lambda, since raising negative numbers to fractional powers is undefined. It assumes a monotonic relationship between the predictor and response; if the relationship is not monotonic, no power transformation will produce linearity. The method targets linearity only and does not address heteroscedasticity or non-normality of residuals.

Reference: NIST/SEMATECH e-Handbook of Statistical Methods, Section 1.3.3.5

Formulas

Box-Cox Transformation

T(X)={Xλ1λλ0ln(X)λ=0T(X) = \begin{cases} \dfrac{X^{\lambda} - 1}{\lambda} & \lambda \neq 0 \\[6pt] \ln(X) & \lambda = 0 \end{cases}

The Box-Cox family of power transformations applied to the predictor variable X. The lambda = 0 case uses the natural logarithm by convention as the limiting case.

Linearity Measure

r(λ)=corr ⁣(Y,  Tλ(X))r(\lambda) = \text{corr}\!\left(Y,\; T_{\lambda}(X)\right)

The Pearson correlation between the response Y and the transformed predictor is computed for each lambda value. The lambda that maximizes r(lambda) yields the most linear relationship.

Python Example

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
# Generate nonlinear relationship: Y = sqrt(X) + noise
rng = np.random.default_rng(42)
X = rng.uniform(1, 50, size=100)
Y = np.sqrt(X) + rng.normal(0, 0.5, size=100)
# Evaluate correlation for a range of lambda values
lambdas = np.linspace(-2, 2, 201)
correlations = []
for lam in lambdas:
if abs(lam) < 1e-10:
T = np.log(X)
else:
T = (X**lam - 1) / lam
r, _ = stats.pearsonr(Y, T)
correlations.append(r)
optimal_idx = np.argmax(correlations)
optimal_lambda = lambdas[optimal_idx]
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(lambdas, correlations, 'b-', linewidth=2)
ax.axvline(optimal_lambda, color='r', linestyle='--',
label=f'Optimal lambda = {optimal_lambda:.2f}')
ax.set_xlabel("Lambda")
ax.set_ylabel("Correlation r(lambda)")
ax.set_title("Box-Cox Linearity Plot")
ax.legend()
plt.tight_layout()
plt.show()