Skip to main content

Q-Q Plot

NIST/SEMATECH Section 1.3.3.24 Q-Q Plot

400 450 500 550 600 650 700 750 800 850 Batch 1 Quantiles 400 500 600 700 800 Batch 2 Quantiles Q-Q Plot (Batch 1 vs Batch 2)
A quantile-quantile (Q-Q) plot is a graphical technique for determining if two data sets come from populations with a common distribution. It plots the quantiles of one data set against the quantiles of another data set, with a $y = x$ reference line indicating identical distributions.

What It Is

A quantile-quantile (Q-Q) plot is a graphical technique for determining if two data sets come from populations with a common distribution. It plots the quantiles of one data set against the quantiles of another data set, with a y=xy = x reference line indicating identical distributions.

For two-sample comparison, the quantiles of dataset 1 are plotted against the quantiles of dataset 2. If sample sizes differ, the quantiles of the smaller sample are plotted against linearly interpolated quantiles of the larger sample. If the distributions are identical, points fall on the y=xy = x identity line. A linear pattern with slope 1\neq 1 indicates a scale difference; a linear pattern shifted from y=xy = x indicates a location difference. Curvature indicates a shape difference.

Questions This Plot Answers

  • Do two data sets come from populations with a common distribution?
  • Do two data sets have common location and scale?
  • Do two data sets have similar distributional shapes?
  • Do two data sets have similar tail behavior?

Why It Matters

The Q-Q plot is the most powerful graphical tool for comparing two distributions because it is sensitive to differences in location, scale, and shape simultaneously. It is particularly effective at detecting subtle tail differences that histograms and summary statistics miss, making it essential for two-sample comparison and model validation.

When to Use a Q-Q Plot

Use a Q-Q plot when you have two data samples and want to determine whether they come from populations with the same distribution. The Q-Q plot can simultaneously detect differences in location, scale, symmetry, and tail behavior. For example, if the two data sets differ only by a shift in location, the points will lie along a straight line displaced from the y=xy = x reference line. The Q-Q plot provides more insight into the nature of distributional differences than analytical methods such as the chi-square or Kolmogorov-Smirnov 2-sample tests. It is similar to a probability plot, but in a probability plot one of the data samples is replaced with the quantiles of a theoretical distribution.

How to Interpret a Q-Q Plot

If the two distributions being compared are identical, the Q-Q plot points will fall on the y=xy = x identity line. A linear pattern with slope 1\neq 1 indicates that the distributions have the same shape but different scales. A linear pattern shifted from the identity line indicates a location difference. Curvature in the Q-Q plot indicates a difference in distributional shape, such as one distribution being more skewed or heavy-tailed than the other. Departures at the extremes of the plot highlight differences in the tails, which may not be apparent from histograms alone. The Q-Q plot is particularly effective at detecting subtle tail differences that formal tests might miss.

Assumptions and Limitations

The Q-Q plot assumes both datasets are drawn from continuous distributions. When comparing two samples, the sample sizes need not be equal; linear interpolation is used to match quantiles. The visual assessment is inherently subjective and should be accompanied by quantitative tests when formal decisions are required. For very small samples, the plot may show scatter around the reference line even when the distributions match.

See It In Action

This technique is demonstrated in the following case studies:

Reference: NIST/SEMATECH e-Handbook of Statistical Methods, Section 1.3.3.24

Formulas

Quantile Matching

Plot  Q1(pi)  vs  Q2(pi)  for percentiles  pi\text{Plot}\; Q_1(p_i) \;\text{vs}\; Q_2(p_i) \;\text{for percentiles}\; p_i

For two-sample comparison, the quantiles Q1(pi)Q_1(p_i) and Q2(pi)Q_2(p_i) of both datasets at matching cumulative probabilities pip_i are plotted against each other. If the distributions are identical, points fall on the y=xy = x identity line.

Hazen Plotting Position

pi=i0.5Np_i = \frac{i - 0.5}{N}

The Hazen plotting position assigns cumulative probability pip_i to the ii-th ordered observation out of NN total. This symmetric formula avoids probabilities of exactly 0 or 1.

Python Example

import numpy as np
import matplotlib.pyplot as plt
# Generate two samples from different distributions
rng = np.random.default_rng(42)
sample1 = rng.normal(loc=50, scale=10, size=200)
sample2 = rng.normal(loc=55, scale=15, size=150)
# Compute quantiles at matching percentiles
n_quantiles = min(len(sample1), len(sample2))
probs = np.linspace(0, 1, n_quantiles + 2)[1:-1]
q1 = np.quantile(sample1, probs)
q2 = np.quantile(sample2, probs)
# Create Q-Q plot
fig, ax = plt.subplots(figsize=(8, 8))
ax.scatter(q1, q2, alpha=0.5, color='steelblue', s=15)
lims = [min(q1.min(), q2.min()), max(q1.max(), q2.max())]
ax.plot(lims, lims, 'r--', linewidth=1, label='y = x')
ax.set_xlabel("Sample 1 Quantiles")
ax.set_ylabel("Sample 2 Quantiles")
ax.set_title("Two-Sample Q-Q Plot")
ax.legend()
ax.set_aspect('equal')
plt.tight_layout()
plt.show()