Q-Q Plot
NIST/SEMATECH Section 1.3.3.24 Q-Q Plot
What It Is
A quantile-quantile (Q-Q) plot is a graphical technique for determining if two data sets come from populations with a common distribution. It plots the quantiles of one data set against the quantiles of another data set, with a reference line indicating identical distributions.
For two-sample comparison, the quantiles of dataset 1 are plotted against the quantiles of dataset 2. If sample sizes differ, the quantiles of the smaller sample are plotted against linearly interpolated quantiles of the larger sample. If the distributions are identical, points fall on the identity line. A linear pattern with slope indicates a scale difference; a linear pattern shifted from indicates a location difference. Curvature indicates a shape difference.
Questions This Plot Answers
- Do two data sets come from populations with a common distribution?
- Do two data sets have common location and scale?
- Do two data sets have similar distributional shapes?
- Do two data sets have similar tail behavior?
Why It Matters
The Q-Q plot is the most powerful graphical tool for comparing two distributions because it is sensitive to differences in location, scale, and shape simultaneously. It is particularly effective at detecting subtle tail differences that histograms and summary statistics miss, making it essential for two-sample comparison and model validation.
When to Use a Q-Q Plot
Use a Q-Q plot when you have two data samples and want to determine whether they come from populations with the same distribution. The Q-Q plot can simultaneously detect differences in location, scale, symmetry, and tail behavior. For example, if the two data sets differ only by a shift in location, the points will lie along a straight line displaced from the reference line. The Q-Q plot provides more insight into the nature of distributional differences than analytical methods such as the chi-square or Kolmogorov-Smirnov 2-sample tests. It is similar to a probability plot, but in a probability plot one of the data samples is replaced with the quantiles of a theoretical distribution.
How to Interpret a Q-Q Plot
If the two distributions being compared are identical, the Q-Q plot points will fall on the identity line. A linear pattern with slope indicates that the distributions have the same shape but different scales. A linear pattern shifted from the identity line indicates a location difference. Curvature in the Q-Q plot indicates a difference in distributional shape, such as one distribution being more skewed or heavy-tailed than the other. Departures at the extremes of the plot highlight differences in the tails, which may not be apparent from histograms alone. The Q-Q plot is particularly effective at detecting subtle tail differences that formal tests might miss.
Assumptions and Limitations
The Q-Q plot assumes both datasets are drawn from continuous distributions. When comparing two samples, the sample sizes need not be equal; linear interpolation is used to match quantiles. The visual assessment is inherently subjective and should be accompanied by quantitative tests when formal decisions are required. For very small samples, the plot may show scatter around the reference line even when the distributions match.
See It In Action
This technique is demonstrated in the following case studies:
Reference: NIST/SEMATECH e-Handbook of Statistical Methods, Section 1.3.3.24
Formulas
Quantile Matching
For two-sample comparison, the quantiles and of both datasets at matching cumulative probabilities are plotted against each other. If the distributions are identical, points fall on the identity line.
Hazen Plotting Position
The Hazen plotting position assigns cumulative probability to the -th ordered observation out of total. This symmetric formula avoids probabilities of exactly 0 or 1.
Python Example
import numpy as npimport matplotlib.pyplot as plt
# Generate two samples from different distributionsrng = np.random.default_rng(42)sample1 = rng.normal(loc=50, scale=10, size=200)sample2 = rng.normal(loc=55, scale=15, size=150)
# Compute quantiles at matching percentilesn_quantiles = min(len(sample1), len(sample2))probs = np.linspace(0, 1, n_quantiles + 2)[1:-1]q1 = np.quantile(sample1, probs)q2 = np.quantile(sample2, probs)
# Create Q-Q plotfig, ax = plt.subplots(figsize=(8, 8))ax.scatter(q1, q2, alpha=0.5, color='steelblue', s=15)lims = [min(q1.min(), q2.min()), max(q1.max(), q2.max())]ax.plot(lims, lims, 'r--', linewidth=1, label='y = x')ax.set_xlabel("Sample 1 Quantiles")ax.set_ylabel("Sample 2 Quantiles")ax.set_title("Two-Sample Q-Q Plot")ax.legend()ax.set_aspect('equal')plt.tight_layout()plt.show()