6-Plot
NIST/SEMATECH Section 1.3.3.33 6-Plot
What It Is
The 6-plot is a regression diagnostic display with six panels for validating a fitted versus model. The six panels are: (1) response and predicted values versus the independent variable, (2) residuals versus the independent variable, (3) residuals versus predicted values, (4) lag plot of residuals, (5) histogram of residuals, and (6) normal probability plot of residuals. It is distinct from the 4-plot, which is a univariate diagnostic.
The six panels are arranged in a grid (2 rows, 3 columns). Row 1: and vs (with fit overlay), residuals vs (checking for non-linearity), residuals vs (checking for heteroscedasticity). Row 2: lag plot of residuals (checking for serial correlation), histogram of residuals (checking for symmetry and normality), normal probability plot of residuals (providing a sensitive normality test). Each panel tests a specific regression assumption.
Questions This Plot Answers
- Are the residuals approximately normally distributed with a fixed location and scale?
- Are there outliers?
- Is the fit adequate?
- Do the residuals suggest a better fit?
Why It Matters
The 6-plot is the comprehensive regression diagnostic: if any of the six panels shows a problem (non-linearity, non-constant variance, non-independence, non-normality), the regression model needs revision. It prevents the common mistake of reporting regression results without validating the assumptions that make those results meaningful.
When to Use a 6-Plot
Use the 6-plot after fitting any versus model to assess its validity. The fit can be linear, non-linear, LOWESS, spline, or any other fit utilizing a single independent variable. The first three panels check whether the functional form is correct and whether the residuals show systematic patterns. The last three panels check the standard assumptions on the residuals: independence (lag plot), approximate normality (histogram and normal probability plot).
How to Interpret a 6-Plot
Panel 1 ( and vs ) overlays the raw data with the fitted curve to visually assess goodness of fit. Panel 2 (residuals vs ) should show a random scatter with no systematic pattern; curvature suggests the model form is wrong. Panel 3 (residuals vs ) should also be structureless; a fan shape indicates non-constant variance. Panel 4 (lag plot of residuals) checks for serial correlation; structure indicates non-independence. Panel 5 (histogram of residuals) should be approximately bell-shaped. Panel 6 (normal probability plot of residuals) should be approximately linear; deviations suggest non-normal errors.
Examples
Good Fit
Panel 1 shows the fit tracking the data closely. Panels 2-3 show residuals scattered randomly with no pattern. Panel 4 shows a structureless cloud. Panels 5-6 confirm approximately normal residuals. All regression assumptions are satisfied.
Non-Linear Misspecification
Panel 1 shows the linear fit missing a curved pattern in the data. Panels 2-3 show systematic curvature in the residuals. The model form needs to include a quadratic or higher-order term.
Non-Constant Variance
Panels 2-3 show a fan shape in the residuals with spread increasing for larger fitted values. This heteroscedasticity invalidates standard error estimates. A variance-stabilizing transformation or weighted regression is needed.
Assumptions and Limitations
The 6-plot assumes a regression model has been fit to bivariate data. The residual-based panels (4 through 6) assume the model has been correctly specified in terms of the response-predictor relationship. The lag plot panel is most informative when the data have a natural ordering (e.g., time of collection). The 6-plot is a screening tool for model validation, not a formal hypothesis test.
Reference: NIST/SEMATECH e-Handbook of Statistical Methods, Section 1.3.3.33
Formulas
General Regression Model
The underlying model that the 6-plot validates: each response is the sum of a deterministic function of the predictor and a random error . The 6-plot checks whether is correctly specified (panels 1–3) and whether satisfies the assumptions of independence, constant variance, and normality (panels 4–6).
Python Example
import numpy as npimport matplotlib.pyplot as pltfrom scipy import stats
# Generate bivariate data with linear relationshiprng = np.random.default_rng(42)n = 100x = np.linspace(5, 50, n) + rng.normal(0, 1, n)y = 3.2 * x + 15 + rng.normal(0, 6, n)
# Fit linear modelresult = stats.linregress(x, y)y_fit = result.slope * x + result.interceptresiduals = y - y_fit
fig, axes = plt.subplots(2, 3, figsize=(16, 10))
# Panel 1: Y and Y-hat vs X (Row 1, Col 1)axes[0, 0].scatter(x, y, alpha=0.5, s=20, label='Data')x_line = np.linspace(x.min(), x.max(), 100)axes[0, 0].plot(x_line, result.slope * x_line + result.intercept, 'r-', linewidth=2, label='Fit')axes[0, 0].set_title("Y and Predicted vs X")axes[0, 0].set_xlabel("X")axes[0, 0].set_ylabel("Y")axes[0, 0].legend(fontsize=8)
# Panel 2: Residuals vs X (Row 1, Col 2)axes[0, 1].scatter(x, residuals, alpha=0.5, s=20)axes[0, 1].axhline(0, color='red', linestyle='--')axes[0, 1].set_title("Residuals vs X")axes[0, 1].set_xlabel("X")axes[0, 1].set_ylabel("Residual")
# Panel 3: Residuals vs Predicted (Row 1, Col 3)axes[0, 2].scatter(y_fit, residuals, alpha=0.5, s=20)axes[0, 2].axhline(0, color='red', linestyle='--')axes[0, 2].set_title("Residuals vs Predicted")axes[0, 2].set_xlabel("Predicted")axes[0, 2].set_ylabel("Residual")
# Panel 4: Lag plot of residuals (Row 2, Col 1)axes[1, 0].scatter(residuals[:-1], residuals[1:], alpha=0.5, s=20)axes[1, 0].set_title("Lag Plot of Residuals")axes[1, 0].set_xlabel("RES(i-1)")axes[1, 0].set_ylabel("RES(i)")
# Panel 5: Histogram of residuals (Row 2, Col 2)axes[1, 1].hist(residuals, bins=15, density=True, alpha=0.7, color='steelblue', edgecolor='white')axes[1, 1].set_title("Residual Histogram")axes[1, 1].set_xlabel("Residual")axes[1, 1].set_ylabel("Density")
# Panel 6: Normal probability plot of residuals (Row 2, Col 3)stats.probplot(residuals, dist="norm", plot=axes[1, 2])axes[1, 2].set_title("Normal Probability Plot")
plt.suptitle("6-Plot \u2014 Regression Diagnostic", y=1.01)plt.tight_layout()plt.show()