Skip to main content

PPCC Plot

NIST/SEMATECH Section 1.3.3.23 PPCC Plot

λ=0.140 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 Shape Parameter (λ) 0.2 0.4 0.6 0.8 1 PPCC PPCC Plot
A probability plot correlation coefficient (PPCC) plot displays the correlation coefficient from a probability plot as a function of a distribution shape parameter. For each candidate value of the shape parameter, a probability plot is constructed and the correlation between the ordered data and the theoretical quantiles is computed, yielding a curve whose peak identifies the best-fitting distribution.

What It Is

A probability plot correlation coefficient (PPCC) plot displays the correlation coefficient from a probability plot as a function of a distribution shape parameter. For each candidate value of the shape parameter, a probability plot is constructed and the correlation between the ordered data and the theoretical quantiles is computed, yielding a curve whose peak identifies the best-fitting distribution.

For each candidate shape parameter λ\lambda, a probability plot is constructed and the correlation between the ordered data and the theoretical quantiles is computed. The resulting (λ,correlation)(\lambda, \text{correlation}) pairs are plotted to form the PPCC curve. The λ\lambda at the peak gives the best-fit shape parameter, and the height of the peak gives the goodness-of-fit measure. For the Tukey-Lambda family: λ1\lambda \approx -1 corresponds to Cauchy (very heavy-tailed), λ=0\lambda = 0 is exactly logistic, λ0.14\lambda \approx 0.14 corresponds to normal, λ=0.5\lambda = 0.5 is U-shaped, and λ=1\lambda = 1 is exactly uniform.

Questions This Plot Answers

  • What is the best-fit member within a distributional family?
  • Does the best-fit member provide a good fit?
  • Does this distributional family provide a good fit compared to other distributions?
  • How sensitive is the choice of the shape parameter?

Why It Matters

The PPCC plot provides a systematic, quantitative method for selecting the best distribution from a parametric family. Rather than subjectively comparing multiple probability plots, the PPCC plot reduces distribution selection to finding a single peak, making the process both more efficient and more reproducible. However, when the peak is broad, multiple distributions may fit nearly equally well, and the analyst should use judgement when selecting among them. A recommended approach is to first perform a coarse search over a wide range of shape parameters, then refine the search around the peak to obtain a more precise estimate.

When to Use a PPCC Plot

Use a PPCC plot when the goal is to identify which member of a distribution family best fits the data, or to estimate the optimal value of a shape parameter. The technique is particularly powerful for the Tukey-Lambda family of distributions, where the shape parameter λ\lambda controls tail heaviness and the PPCC plot can distinguish between short-tailed, normal, and long-tailed distributions. It provides a data-driven method for distribution selection that is more systematic than visual inspection of multiple probability plots.

How to Interpret a PPCC Plot

The horizontal axis shows the shape parameter value and the vertical axis shows the corresponding probability plot correlation coefficient. The peak of the curve identifies the optimal shape parameter, and the height of the peak indicates the overall goodness of fit. A high peak close to 1.0 indicates an excellent fit. For the Tukey-Lambda PPCC plot: λ1\lambda \approx -1 corresponds to a Cauchy distribution, λ=0\lambda = 0 corresponds to the logistic distribution, λ0.14\lambda \approx 0.14 corresponds to the normal distribution, λ=0.5\lambda = 0.5 yields a U-shaped distribution, and λ=1\lambda = 1 is exactly uniform. If the optimal λ\lambda is less than 0.14, a long-tailed distribution such as the double exponential or logistic is a better choice; if greater than 0.14, a short-tailed distribution such as the beta or uniform is more appropriate. The width of the peak also carries information: a broad peak suggests the data are compatible with a range of distributions, while a narrow peak indicates strong evidence for a specific shape.

Assumptions and Limitations

The PPCC plot assumes that the data are a random sample from a continuous distribution. It requires a distribution family parameterized by a shape parameter, which limits its applicability to families with such structure. The correlation coefficient as a goodness-of-fit measure is most sensitive to departures in the center of the distribution and somewhat less sensitive to tail behavior compared to formal tests like Anderson-Darling.

See It In Action

This technique is demonstrated in the following case studies:

Reference: NIST/SEMATECH e-Handbook of Statistical Methods, Section 1.3.3.23

Formulas

Probability Plot Correlation Coefficient

PPCC(λ)=Corr ⁣(X(i),  M(i)(λ))\text{PPCC}(\lambda) = \operatorname{Corr}\!\bigl(X_{(i)},\; M_{(i)}(\lambda)\bigr)

The PPCC at shape parameter λ\lambda is the Pearson correlation between the ordered data X(i)X_{(i)} and the theoretical quantiles M(i)M_{(i)} from the candidate distribution. A value close to 1 indicates an excellent fit.

Optimal Shape Parameter

λ^=argmaxλ  PPCC(λ)\hat{\lambda} = \underset{\lambda}{\arg\max}\;\text{PPCC}(\lambda)

The optimal shape parameter λ^\hat{\lambda} is the value of λ\lambda that maximizes the probability plot correlation coefficient. The PPCC curve is plotted across a range of λ\lambda values to visually identify this peak.

Tukey-Lambda Quantile Function

Q(p;λ)={pλ(1p)λλλ0ln ⁣p1pλ=0Q(p;\,\lambda) = \begin{cases} \dfrac{p^{\lambda} - (1-p)^{\lambda}}{\lambda} & \lambda \neq 0 \\[6pt] \ln\!\dfrac{p}{1-p} & \lambda = 0 \end{cases}

The percent point function of the Tukey-Lambda distribution. When λ=0\lambda = 0 the quantile function reduces to the logistic distribution. The theoretical quantiles M(i)(λ)M_{(i)}(\lambda) are computed by applying this function to the Filliben uniform order statistic medians.

Python Example

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import ppcc_plot, tukeylambda
# Generate data from Tukey-Lambda distribution (lambda=-0.5, long-tailed)
rng = np.random.default_rng(42)
uniform_samples = rng.uniform(0.005, 0.995, size=200)
data = tukeylambda.ppf(uniform_samples, -0.5)
fig, ax = plt.subplots(figsize=(8, 5))
ppcc_plot(data, -2, 2, plot=ax)
ax.set_xlabel("Shape Parameter (lambda)")
ax.set_ylabel("Correlation Coefficient")
ax.set_title("PPCC Plot (Tukey-Lambda Family)")
plt.tight_layout()
plt.show()