Skip to main content

Normal Probability Plot

NIST/SEMATECH Section 1.3.3.21 Normal Probability Plot

-3 -2 -1 0 1 2 3 Normal N(0,1) Order Statistic Medians -3 -2 -1 0 1 2 3 Ordered Response Normal Data

What It Is

A normal probability plot displays the sorted data values on the vertical axis against the expected normal order statistics (theoretical quantiles) on the horizontal axis. If the data follow a normal distribution, the points fall approximately along a straight reference line, with departures from linearity indicating specific types of non-normality.

The ordered data values Y_{(1)} ≤ Y_{(2)} ≤ ... ≤ Y_{(N)} are plotted on the vertical axis against the normal order statistic medians N_i = Φ⁻¹(U_i) on the horizontal axis, where U_i are the uniform order statistic medians computed via the Filliben approximation. If the data are normal, the points fall on a straight line whose slope estimates the standard deviation and whose intercept estimates the mean. The correlation coefficient of the fitted line can be compared to a table of critical values to provide a formal test of normality. The normal probability plot is a special case of the general probability plot, where the normal percent point function is replaced by the percent point function of any desired distribution.

Questions This Plot Answers

  • Are the data normally distributed?
  • What is the nature of the departure from normality (skewed, short tails, long tails)?

Why It Matters

Normality is the most frequently tested distributional assumption in statistics. The normal probability plot is more sensitive than the histogram for detecting departures from normality because it magnifies tail behavior, which is exactly where non-normality has the greatest impact on statistical inference (confidence intervals, hypothesis tests, capability indices).

When to Use a Normal Probability Plot

Use a normal probability plot as the primary graphical tool for assessing whether a dataset is consistent with a normal distribution. Normality is a foundational assumption for many statistical procedures, including t-tests, ANOVA, regression inference, and capability analysis. The normal probability plot provides a more sensitive and detailed assessment than the histogram because it magnifies departures in the tails, which are the regions most consequential for statistical inference.

How to Interpret a Normal Probability Plot

Points that follow the reference line closely indicate that the data are consistent with a normal distribution. For short-tailed distributions, the first few points depart above the fitted line and the last few points depart below it. For long-tailed distributions, this pattern is reversed: the first few points depart below the line and the last few depart above. Both short- and long-tailed data may also show an S-shaped pattern in the middle. Right-skewed data produce a concave (quadratic) pattern in which all points fall below a line connecting the first and last points; left-skewed data produce the mirror pattern with all points above. The correlation coefficient of the fitted line can be compared to a table of critical values to provide a formal test of normality.

Examples

Normal Data

Points follow the reference line closely from end to end with only minor random scatter. The correlation coefficient of the fitted line is close to 1.0. This confirms that the normal distribution provides a good model for the data.

Short Tails

The middle of the data shows an S-shaped pattern. The first few points depart above the fitted line and the last few points depart below the fitted line. This indicates a distribution with shorter tails than the normal. A Tukey Lambda PPCC plot can help identify an appropriate distributional family.

Long Tails

The middle of the data may show a mild S-shaped pattern. The first few points depart below the fitted line and the last few points depart above the fitted line -- the opposite direction from the short-tailed case. This indicates a distribution with longer tails than the normal (e.g., double exponential). A Tukey Lambda PPCC plot can help identify an appropriate distributional family.

Right Skewed

Points show a strongly non-linear, concave pattern in which all points fall below a reference line drawn between the first and last points. This is the signature of a significantly right-skewed data set. A right-skewed distribution such as the Weibull or lognormal may be more appropriate.

Assumptions and Limitations

The normal probability plot is a graphical technique and does not yield a formal p-value for normality. It is most effective for moderate to large sample sizes, as small samples may not produce a clear linear pattern even when drawn from a normal distribution. The plotting positions (theoretical quantiles) are typically computed using the Filliben, Hazen, or Blom formula, and the choice can slightly affect the visual impression for small samples. This implementation uses the Filliben approximation for uniform order statistic medians.

See It In Action

This technique is demonstrated in the following case studies:

Reference: NIST/SEMATECH e-Handbook of Statistical Methods, Section 1.3.3.21

Formulas

Uniform Order Statistic Medians (Filliben)

Ui={1Uni=1i0.3175N+0.365i=2,,N10.51/Ni=NU_i = \begin{cases} 1 - U_n & i = 1 \\ \dfrac{i - 0.3175}{N + 0.365} & i = 2, \ldots, N{-}1 \\ 0.5^{1/N} & i = N \end{cases}

The Filliben approximation for uniform order statistic medians, which serve as the intermediate step in computing the normal order statistic medians.

Normal Order Statistic Medians

Mi=Φ1(Ui)M_i = \Phi^{-1}(U_i)

The normal order statistic medians are obtained by applying the inverse standard normal CDF to the uniform order statistic medians. These serve as the horizontal axis values.

Parameter Estimates from Fitted Line

σ^=slope,μ^=intercept\hat{\sigma} = \text{slope}, \quad \hat{\mu} = \text{intercept}

When the data are normally distributed and the points fall on a straight line, the slope of the fitted line estimates the standard deviation and the intercept estimates the mean.

Python Example

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
rng = np.random.default_rng(42)
# Four NIST examples (1.3.3.21.1-4)
normal = rng.normal(0, 1, 200)
# Short tails: Tukey-Lambda (lambda=1.1)
u = rng.uniform(0.001, 0.999, 500)
lam = 1.1
short_tail = (u**lam - (1 - u)**lam) / lam
# Long tails: double exponential (Laplace)
long_tail = rng.laplace(0, 1, 500)
# Right skewed: lognormal
right_skew = rng.lognormal(0, 1, 200)
fig, axes = plt.subplots(2, 2, figsize=(10, 8))
datasets = [
(normal, "Normal Data"),
(short_tail, "Short-Tailed Data"),
(long_tail, "Long-Tailed Data"),
(right_skew, "Right Skewed Data"),
]
for ax, (data, title) in zip(axes.flat, datasets):
res = stats.probplot(data, dist='norm', plot=ax)
ax.set_title(title)
ax.set_xlabel("Normal N(0,1) Order Statistic Medians")
ax.set_ylabel("Ordered Response")
ax.get_lines()[0].set_markerfacecolor('steelblue')
ax.get_lines()[0].set_markeredgecolor('steelblue')
plt.suptitle("Normal Probability Plot (NIST 1.3.3.21)", y=1.02)
plt.tight_layout()
plt.show()