Skip to main content

6-Plot

NIST/SEMATECH Section 1.3.3.33 6-Plot

R² = 1.0000 0 5 10 15 20 25 X 0 1 2 3 4 5 6 7 Y Y & Predicted vs X 0 5 10 15 20 25 X -0.02 -0.01 0 0.01 0.02 0.03 Residual Residuals vs X 0 1 2 3 4 5 6 7 Predicted -0.02 -0.01 0 0.01 0.02 0.03 Residual Residuals vs Predicted -0.02 -0.01 0 0.01 0.02 0.03 Y(i-1) -0.02 -0.01 0 0.01 0.02 0.03 Y(i) Lag of Residuals -0.02 -0.01 0 0.01 0.02 Residual 0 5 10 15 20 25 Frequency Residual Histogram -2 -1 0 1 2 Normal N(0,1) Order Statistic Medians -0.02 -0.01 0 0.01 0.02 0.03 Ordered Response Residual Normal Prob
The 6-plot is a regression diagnostic display with six panels for validating a fitted $Y$ versus $X$ model. The six panels are: (1) response and predicted values versus the independent variable, (2) residuals versus the independent variable, (3) residuals versus predicted values, (4) lag plot of residuals, (5) histogram of residuals, and (6) normal probability plot of residuals. It is distinct from the 4-plot, which is a univariate diagnostic.

What It Is

The 6-plot is a regression diagnostic display with six panels for validating a fitted YY versus XX model. The six panels are: (1) response and predicted values versus the independent variable, (2) residuals versus the independent variable, (3) residuals versus predicted values, (4) lag plot of residuals, (5) histogram of residuals, and (6) normal probability plot of residuals. It is distinct from the 4-plot, which is a univariate diagnostic.

The six panels are arranged in a 2×32 \times 3 grid (2 rows, 3 columns). Row 1: YY and Y^\hat{Y} vs XX (with fit overlay), residuals vs XX (checking for non-linearity), residuals vs Y^\hat{Y} (checking for heteroscedasticity). Row 2: lag plot of residuals (checking for serial correlation), histogram of residuals (checking for symmetry and normality), normal probability plot of residuals (providing a sensitive normality test). Each panel tests a specific regression assumption.

Questions This Plot Answers

  • Are the residuals approximately normally distributed with a fixed location and scale?
  • Are there outliers?
  • Is the fit adequate?
  • Do the residuals suggest a better fit?

Why It Matters

The 6-plot is the comprehensive regression diagnostic: if any of the six panels shows a problem (non-linearity, non-constant variance, non-independence, non-normality), the regression model needs revision. It prevents the common mistake of reporting regression results without validating the assumptions that make those results meaningful.

When to Use a 6-Plot

Use the 6-plot after fitting any YY versus XX model to assess its validity. The fit can be linear, non-linear, LOWESS, spline, or any other fit utilizing a single independent variable. The first three panels check whether the functional form is correct and whether the residuals show systematic patterns. The last three panels check the standard assumptions on the residuals: independence (lag plot), approximate normality (histogram and normal probability plot).

How to Interpret a 6-Plot

Panel 1 (YY and Y^\hat{Y} vs XX) overlays the raw data with the fitted curve to visually assess goodness of fit. Panel 2 (residuals vs XX) should show a random scatter with no systematic pattern; curvature suggests the model form is wrong. Panel 3 (residuals vs Y^\hat{Y}) should also be structureless; a fan shape indicates non-constant variance. Panel 4 (lag plot of residuals) checks for serial correlation; structure indicates non-independence. Panel 5 (histogram of residuals) should be approximately bell-shaped. Panel 6 (normal probability plot of residuals) should be approximately linear; deviations suggest non-normal errors.

Examples

Good Fit

Panel 1 shows the fit tracking the data closely. Panels 2-3 show residuals scattered randomly with no pattern. Panel 4 shows a structureless cloud. Panels 5-6 confirm approximately normal residuals. All regression assumptions are satisfied.

Non-Linear Misspecification

Panel 1 shows the linear fit missing a curved pattern in the data. Panels 2-3 show systematic curvature in the residuals. The model form needs to include a quadratic or higher-order term.

Non-Constant Variance

Panels 2-3 show a fan shape in the residuals with spread increasing for larger fitted values. This heteroscedasticity invalidates standard error estimates. A variance-stabilizing transformation or weighted regression is needed.

Assumptions and Limitations

The 6-plot assumes a regression model has been fit to bivariate (X,Y)(X, Y) data. The residual-based panels (4 through 6) assume the model has been correctly specified in terms of the response-predictor relationship. The lag plot panel is most informative when the data have a natural ordering (e.g., time of collection). The 6-plot is a screening tool for model validation, not a formal hypothesis test.

Reference: NIST/SEMATECH e-Handbook of Statistical Methods, Section 1.3.3.33

Formulas

General Regression Model

Yi=f(Xi)+EiY_i = f(X_i) + E_i

The underlying model that the 6-plot validates: each response YiY_i is the sum of a deterministic function of the predictor f(Xi)f(X_i) and a random error EiE_i. The 6-plot checks whether ff is correctly specified (panels 1–3) and whether EiE_i satisfies the assumptions of independence, constant variance, and normality (panels 4–6).

Python Example

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
# Generate bivariate data with linear relationship
rng = np.random.default_rng(42)
n = 100
x = np.linspace(5, 50, n) + rng.normal(0, 1, n)
y = 3.2 * x + 15 + rng.normal(0, 6, n)
# Fit linear model
result = stats.linregress(x, y)
y_fit = result.slope * x + result.intercept
residuals = y - y_fit
fig, axes = plt.subplots(2, 3, figsize=(16, 10))
# Panel 1: Y and Y-hat vs X (Row 1, Col 1)
axes[0, 0].scatter(x, y, alpha=0.5, s=20, label='Data')
x_line = np.linspace(x.min(), x.max(), 100)
axes[0, 0].plot(x_line, result.slope * x_line + result.intercept,
'r-', linewidth=2, label='Fit')
axes[0, 0].set_title("Y and Predicted vs X")
axes[0, 0].set_xlabel("X")
axes[0, 0].set_ylabel("Y")
axes[0, 0].legend(fontsize=8)
# Panel 2: Residuals vs X (Row 1, Col 2)
axes[0, 1].scatter(x, residuals, alpha=0.5, s=20)
axes[0, 1].axhline(0, color='red', linestyle='--')
axes[0, 1].set_title("Residuals vs X")
axes[0, 1].set_xlabel("X")
axes[0, 1].set_ylabel("Residual")
# Panel 3: Residuals vs Predicted (Row 1, Col 3)
axes[0, 2].scatter(y_fit, residuals, alpha=0.5, s=20)
axes[0, 2].axhline(0, color='red', linestyle='--')
axes[0, 2].set_title("Residuals vs Predicted")
axes[0, 2].set_xlabel("Predicted")
axes[0, 2].set_ylabel("Residual")
# Panel 4: Lag plot of residuals (Row 2, Col 1)
axes[1, 0].scatter(residuals[:-1], residuals[1:], alpha=0.5, s=20)
axes[1, 0].set_title("Lag Plot of Residuals")
axes[1, 0].set_xlabel("RES(i-1)")
axes[1, 0].set_ylabel("RES(i)")
# Panel 5: Histogram of residuals (Row 2, Col 2)
axes[1, 1].hist(residuals, bins=15, density=True,
alpha=0.7, color='steelblue', edgecolor='white')
axes[1, 1].set_title("Residual Histogram")
axes[1, 1].set_xlabel("Residual")
axes[1, 1].set_ylabel("Density")
# Panel 6: Normal probability plot of residuals (Row 2, Col 3)
stats.probplot(residuals, dist="norm", plot=axes[1, 2])
axes[1, 2].set_title("Normal Probability Plot")
plt.suptitle("6-Plot \u2014 Regression Diagnostic", y=1.01)
plt.tight_layout()
plt.show()