PPCC Plot
NIST/SEMATECH Section 1.3.3.23 PPCC Plot
What It Is
A probability plot correlation coefficient (PPCC) plot displays the correlation coefficient from a probability plot as a function of a distribution shape parameter. For each candidate value of the shape parameter, a probability plot is constructed and the correlation between the ordered data and the theoretical quantiles is computed, yielding a curve whose peak identifies the best-fitting distribution.
For each candidate shape parameter , a probability plot is constructed and the correlation between the ordered data and the theoretical quantiles is computed. The resulting pairs are plotted to form the PPCC curve. The at the peak gives the best-fit shape parameter, and the height of the peak gives the goodness-of-fit measure. For the Tukey-Lambda family: corresponds to Cauchy (very heavy-tailed), is exactly logistic, corresponds to normal, is U-shaped, and is exactly uniform.
Questions This Plot Answers
- What is the best-fit member within a distributional family?
- Does the best-fit member provide a good fit?
- Does this distributional family provide a good fit compared to other distributions?
- How sensitive is the choice of the shape parameter?
Why It Matters
The PPCC plot provides a systematic, quantitative method for selecting the best distribution from a parametric family. Rather than subjectively comparing multiple probability plots, the PPCC plot reduces distribution selection to finding a single peak, making the process both more efficient and more reproducible. However, when the peak is broad, multiple distributions may fit nearly equally well, and the analyst should use judgement when selecting among them. A recommended approach is to first perform a coarse search over a wide range of shape parameters, then refine the search around the peak to obtain a more precise estimate.
When to Use a PPCC Plot
Use a PPCC plot when the goal is to identify which member of a distribution family best fits the data, or to estimate the optimal value of a shape parameter. The technique is particularly powerful for the Tukey-Lambda family of distributions, where the shape parameter controls tail heaviness and the PPCC plot can distinguish between short-tailed, normal, and long-tailed distributions. It provides a data-driven method for distribution selection that is more systematic than visual inspection of multiple probability plots.
How to Interpret a PPCC Plot
The horizontal axis shows the shape parameter value and the vertical axis shows the corresponding probability plot correlation coefficient. The peak of the curve identifies the optimal shape parameter, and the height of the peak indicates the overall goodness of fit. A high peak close to 1.0 indicates an excellent fit. For the Tukey-Lambda PPCC plot: corresponds to a Cauchy distribution, corresponds to the logistic distribution, corresponds to the normal distribution, yields a U-shaped distribution, and is exactly uniform. If the optimal is less than 0.14, a long-tailed distribution such as the double exponential or logistic is a better choice; if greater than 0.14, a short-tailed distribution such as the beta or uniform is more appropriate. The width of the peak also carries information: a broad peak suggests the data are compatible with a range of distributions, while a narrow peak indicates strong evidence for a specific shape.
Assumptions and Limitations
The PPCC plot assumes that the data are a random sample from a continuous distribution. It requires a distribution family parameterized by a shape parameter, which limits its applicability to families with such structure. The correlation coefficient as a goodness-of-fit measure is most sensitive to departures in the center of the distribution and somewhat less sensitive to tail behavior compared to formal tests like Anderson-Darling.
See It In Action
This technique is demonstrated in the following case studies:
Reference: NIST/SEMATECH e-Handbook of Statistical Methods, Section 1.3.3.23
Formulas
Probability Plot Correlation Coefficient
The PPCC at shape parameter is the Pearson correlation between the ordered data and the theoretical quantiles from the candidate distribution. A value close to 1 indicates an excellent fit.
Optimal Shape Parameter
The optimal shape parameter is the value of that maximizes the probability plot correlation coefficient. The PPCC curve is plotted across a range of values to visually identify this peak.
Tukey-Lambda Quantile Function
The percent point function of the Tukey-Lambda distribution. When the quantile function reduces to the logistic distribution. The theoretical quantiles are computed by applying this function to the Filliben uniform order statistic medians.
Python Example
import numpy as npimport matplotlib.pyplot as pltfrom scipy.stats import ppcc_plot, tukeylambda
# Generate data from Tukey-Lambda distribution (lambda=-0.5, long-tailed)rng = np.random.default_rng(42)uniform_samples = rng.uniform(0.005, 0.995, size=200)data = tukeylambda.ppf(uniform_samples, -0.5)
fig, ax = plt.subplots(figsize=(8, 5))ppcc_plot(data, -2, 2, plot=ax)ax.set_xlabel("Shape Parameter (lambda)")ax.set_ylabel("Correlation Coefficient")ax.set_title("PPCC Plot (Tukey-Lambda Family)")plt.tight_layout()plt.show()