Probability Plot
NIST/SEMATECH Section 1.3.3.22 Probability Plot
What It Is
A probability plot is a graphical technique for assessing whether a dataset follows a specified theoretical distribution by plotting the ordered data values against the quantiles of the theoretical distribution. It generalizes the normal probability plot to any distributional family, including Weibull, lognormal, exponential, and others.
The ordered response values are plotted on the vertical axis against order statistic medians for the hypothesized distribution on the horizontal axis. The order statistic medians are computed by applying the percent point function (inverse CDF) of the hypothesized distribution to the uniform order statistic medians. If the data follow the hypothesized distribution, the points form a straight line whose intercept and slope estimate the location and scale parameters. This technique generalizes to any distribution for which the percent point function can be computed. Comparing probability plots across several candidate distributions (normal, Weibull, lognormal, exponential, etc.) identifies the best-fitting family by selecting the one with the highest probability plot correlation coefficient.
Questions This Plot Answers
- Does a given distribution provide a good fit to the data?
- What distribution best fits the data?
- What are good estimates of the location and scale parameters?
Why It Matters
Choosing the correct distributional model is essential for reliability prediction, process capability analysis, and simulation. The probability plot provides a visual goodness-of-fit assessment for any hypothesized distribution, making it the most versatile single tool for distribution identification. The slope and intercept of the fitted line directly estimate the distribution parameters.
When to Use a Probability Plot
Use a probability plot when evaluating whether data are consistent with a hypothesized distribution, a question that arises in reliability analysis, process capability studies, and statistical modeling. The probability plot is one of the most versatile graphical tools in statistics because it can assess fit to any continuous distribution, not just the normal. It also provides visual estimates of distribution parameters: the intercept and slope of the fitted line correspond to the location and scale parameters of the distribution.
How to Interpret a Probability Plot
If the data follow the hypothesized distribution, the points will fall approximately along a straight line. Systematic departures from the line indicate that the hypothesized distribution does not fit the data well. The nature of the departure provides diagnostic information: S-shaped departures suggest a different tail weight, concave or convex patterns suggest skewness mismatch, and clusters or gaps in the points suggest discreteness or contamination. For location-scale families, the fitted line directly provides parameter estimates, with the intercept estimating the location parameter and the slope estimating the scale parameter.
Examples
Good Fit
Points follow the fitted line closely across the entire range, with only minor random scatter. The probability plot correlation coefficient is close to 1.0. The hypothesized distribution provides a good model for the data.
S-Shaped Departure
Points form an S-curve around the reference line, with systematic departures at both tails. This indicates the data have a different tail weight than the hypothesized distribution. Try a distribution with heavier or lighter tails.
Concave Departure
Points show a consistent concave curvature, bowing away from the reference line. This indicates a skewness mismatch — the data are more skewed than the hypothesized distribution. Try a more skewed distributional family.
Assumptions and Limitations
The probability plot requires that the analyst specify the family of distributions to evaluate. If the family is incorrect, the plot will show systematic departures regardless of parameter choices. Multiple probability plots for different families can be compared to find the best fit. The visual assessment should be supplemented with a quantitative goodness-of-fit test such as the Anderson-Darling test when a formal decision is needed.
See It In Action
This technique is demonstrated in the following case studies:
Reference: NIST/SEMATECH e-Handbook of Statistical Methods, Section 1.3.3.22
Formulas
Order Statistic Medians
The order statistic medians for the hypothesized distribution are computed by applying the percent point function G (inverse CDF) to the uniform order statistic medians U_i. These form the horizontal axis of the probability plot.
Uniform Order Statistic Medians (Filliben)
The Filliben approximation for the uniform order statistic medians. These probabilities are transformed through the percent point function of the hypothesized distribution to obtain the theoretical quantiles for the horizontal axis.
Probability Plot Correlation Coefficient
The correlation between the ordered data values Y_(i) and the order statistic medians N_i. A PPCC close to 1.0 indicates the hypothesized distribution provides a good fit. Comparing PPCC values across distributions identifies the best-fitting family.
Python Example
import numpy as npimport matplotlib.pyplot as pltfrom scipy import stats
# Generate Weibull-distributed sample data (shape=2, scale=1)rng = np.random.default_rng(42)data = rng.weibull(a=2.0, size=100)
# Compare probability plots for different distributionsfig, axes = plt.subplots(1, 3, figsize=(15, 5))
# Weibull probability plot (correct distribution)res1 = stats.probplot(data, sparams=(2.0,), dist='weibull_min', plot=axes[0])axes[0].set_title("Weibull Prob Plot (Good Fit)")
# Normal probability plot (wrong distribution)res2 = stats.probplot(data, dist='norm', plot=axes[1])axes[1].set_title("Normal Prob Plot (Poor Fit)")
# Exponential probability plot (wrong distribution)res3 = stats.probplot(data, dist='expon', plot=axes[2])axes[2].set_title("Exponential Prob Plot (Poor Fit)")
for ax in axes: ax.set_xlabel("Theoretical Quantiles") ax.set_ylabel("Ordered Values") ax.get_lines()[0].set_markerfacecolor('steelblue') ax.get_lines()[0].set_markeredgecolor('steelblue')
plt.tight_layout()plt.show()