Conditioning Plot
NIST/SEMATECH Section 1.3.3.26.12 Conditioning Plot
What It Is
A conditioning plot, also called a coplot or subset plot, is a plot of two variables conditional on the value of a third variable (called the conditioning variable). The conditioning variable may be either a variable that takes on only a few discrete values or a continuous variable that is divided into a limited number of subsets.
Given the variables X, Y, and Z, the conditioning plot is formed by dividing the values of Z into k groups. There are several ways these groups may be formed: there may be a natural grouping of the data, the data may be divided into several equal-sized groups, the grouping may be determined by clusters in the data, and so on. The page is divided into n rows and c columns where nc >= k. Each cell defines a single scatter plot with Y on the vertical axis and X on the horizontal axis, using only points in the corresponding group.
Questions This Plot Answers
- Is there a relationship between two variables?
- If there is a relationship, does the nature of the relationship depend on the value of a third variable?
- Are groups in the data similar?
- Are there outliers in the data?
Why It Matters
The conditioning plot is the most direct graphical method for detecting interactions in continuous data. It answers whether a two-variable relationship is the same everywhere or changes depending on a third variable, which is fundamental for regression modeling and process understanding.
When to Use a Conditioning Plot
Use a conditioning plot when exploring how a bivariate relationship changes across levels of a third variable, a question that is central to detecting interactions and confounding in multivariate data. One limitation of the scatterplot matrix is that it cannot show interaction effects with another variable; this is the strength of the conditioning plot. It is also useful for displaying scatter plots for groups in the data. Although these groups can also be plotted on a single plot with different plot symbols, it can often be visually easier to distinguish the groups using the conditioning plot.
How to Interpret a Conditioning Plot
Each panel in the conditioning plot shows a scatter plot of the two primary variables for a subset of data falling within a particular range of the conditioning variable. The analyst examines whether the pattern or strength of the relationship changes across panels. If the scatter plots look similar across all panels, the relationship is stable and there is no interaction with the conditioning variable. If the pattern changes, the conditioning variable is influencing the bivariate relationship. It can be helpful to overlay a fitted curve such as a lowess smooth on each panel. The panels are ordered by the conditioning variable, so systematic changes across the grid are informative.
Assumptions and Limitations
The conditioning plot requires that the conditioning variable have enough distinct values to form meaningful groups. Although the basic concept is simple, there are numerous alternatives in the details: the type of fitted curve overlay (linear, quadratic, or lowess), the axis labeling scheme (alternating labels across rows and columns works well when axis limits are common), whether panels are connected or separated by gaps, and the extension of the concept to plot formats beyond scatter plots.
Reference: NIST/SEMATECH e-Handbook of Statistical Methods, Section 1.3.3.26.12
Python Example
import numpy as npimport matplotlib.pyplot as plt
# Generate data with an interaction effectrng = np.random.default_rng(42)n = 200x = rng.uniform(0, 10, n)z = rng.uniform(0, 30, n) # conditioning variable# Slope of Y vs X changes with Zy = (1 + 0.2 * z) * x + 10 + rng.normal(0, 3, n)
# Split Z into 3 conditioning intervalsz_cuts = np.percentile(z, [0, 33, 67, 100])labels = [f'Z: {z_cuts[i]:.0f}-{z_cuts[i+1]:.0f}' for i in range(3)]
fig, axes = plt.subplots(1, 3, figsize=(15, 4), sharey=True)for i in range(3): mask = (z >= z_cuts[i]) & (z <= z_cuts[i + 1]) axes[i].scatter(x[mask], y[mask], alpha=0.6, s=30) # Fit local trend coeffs = np.polyfit(x[mask], y[mask], 1) x_line = np.linspace(0, 10, 50) axes[i].plot(x_line, np.polyval(coeffs, x_line), 'r-', linewidth=2) axes[i].set_title(labels[i]) axes[i].set_xlabel("X") axes[i].grid(True, alpha=0.3)
axes[0].set_ylabel("Y")plt.suptitle("Conditioning Plot — Y vs X | Z", y=1.02)plt.tight_layout()plt.show()