Fatigue Life of Aluminum Alloy Specimens

NIST/SEMATECH Section 1.4.2.9 Fatigue Life of Aluminum Alloy Specimens

Background and Data

This case study applies exploratory data analysis to 101 measurements of fatigue life (thousands of cycles until rupture) of rectangular strips of 6061-T6 aluminum sheeting, subjected to periodic loading with maximum stress of 21,000 psi, as reported by Birnbaum and Saunders (1958). The dataset is referred to as BIRNSAUN.DAT in the NIST archive.

The dataset originates from NIST/SEMATECH Section 1.4.2.9. Unlike the other case studies in this collection, the fatigue life study focuses specifically on probabilistic model selection: determining which probability distribution best describes the dispersion of measured lifetimes for reliability prediction and warranty estimation.

Dataset

BIRNSAUN.DAT

Observations: 101

Variable: Fatigue life (thousands of cycles)

Birnbaum & Saunders (1958), aluminum fatigue life

NIST source description

Source: Birnbaum, Z. W. and Saunders, S. C. (1958), "A Statistical Model for Life-Length of Materials", Journal of the American Statistical Association, 53(281), pp. 151–160. Response variable = fatigue life (thousands of cycles until rupture) of rectangular strips of 6061-T6 aluminum sheeting, subjected to periodic loading with maximum stress of 21,000 psi. Number of observations = 101.

Preview data

#	Value
1	370
2	1016
3	1235
4	1419
5	1567
6	1820
7	706
8	1018
9	1238
10	1420
... 91 more rows

Download CSV NIST Source

Test Underlying Assumptions

Goals

The goals of this analysis are:

Determine if the fatigue life data can be adequately modeled with a known probability distribution for reliability prediction
Assess the fixed-location assumption: is the process mean stable over the run?
Assess the fixed-variation assumption: is the process variability constant?
Assess the randomness assumption: are the observations independent?
Determine which probability distribution (Normal, Gamma, Weibull, or Birnbaum-Saunders) best describes the fatigue life dispersion

Graphical Output and Interpretation

4-Plot Overview

The 4-plot provides a comprehensive view of the fatigue life data. The run sequence plot should show stable location and variation, the lag plot should show no dependence, and the histogram reveals the right-skewed distribution.

Four-plot diagnostic layout for the fatigue life dataset (run sequence, lag, histogram, normal probability).

The assumptions are addressed by the four diagnostic plots:

The run sequence plot (upper left) shows 101 observations fluctuating around a stable mean with no systematic drift or shifts — the fixed-location and fixed-variation assumptions appear satisfied.
The lag plot (upper right) shows a structureless scatter cloud, consistent with independent observations — the randomness assumption appears satisfied.
The histogram (lower left) is right-skewed, with most observations clustering between 800 and 1,600 thousand cycles and a long upper tail extending past 2,000 — the distribution is not symmetric and not bell-shaped.
The normal probability plot (lower right) shows deviations from the theoretical straight line, particularly in the tails — the normality assumption is questionable.

Three of four assumptions hold: fixed location, fixed variation, and randomness. The distributional assumption requires further investigation — the right-skewed shape suggests that alternative distributions (Gamma, Weibull, Birnbaum-Saunders) may provide a better fit than the normal.

Box Plot

The initial graphical analysis uses dot charts, box plots, histograms, and kernel density estimates to characterize the distribution. The box plot suggests the largest measured value may be an outlier.

Box plot of fatigue life measurements showing the interquartile range, median, whiskers, and a potential outlier at the high end near 2,440 thousand cycles.

Run Sequence Plot

The run sequence plot shows 101 observations fluctuating around the mean of approximately 1401 thousand cycles. The location and variation appear stable across the run, with no systematic drift or shifts.

Run sequence plot of 101 fatigue life measurements showing stable location and variation around a mean of approximately 1401 thousand cycles.

Lag Plot

The lag plot at lag 1 shows a structureless scatter cloud, consistent with independent observations. The slight elongation in the scatter reflects the skewed distribution rather than serial dependence.

Lag-1 plot showing a structureless scatter cloud, consistent with independent observations. The elongation reflects the right-skewed distribution rather than serial dependence.

Histogram

The data range from slightly above 350 to slightly below 2,500 thousand cycles. The histogram is right-skewed, with most observations clustering between 800 and 1,600 thousand cycles and a long upper tail extending past 2,000.

Histogram with KDE overlay revealing a right-skewed distribution. Most observations cluster between 800 and 1,600 thousand cycles with a long upper tail past 2,000.

Normal Probability Plot

The normal probability plot (Q-Q plot) shows the data deviating from the theoretical straight line, particularly in the tails. When compared against an envelope of 99 simulated Gaussian samples of the same size, the data Q-Q plot falls within the envelope, suggesting the Gaussian model cannot be firmly rejected on graphical grounds alone.

Normal probability plot of the fatigue life data. Deviation from linearity in the tails suggests departure from a Gaussian model, though the effect is moderate.

Autocorrelation Plot

The autocorrelation plot confirms that the fatigue life measurements are independent. All autocorrelation coefficients fall within the 95% confidence bands.

Autocorrelation plot confirming that the fatigue life measurements are independent — all coefficients fall within the 95% confidence bands.

Spectral Plot

The spectral plot shows a flat spectrum with no dominant peaks, consistent with independent observations.

Spectral plot showing a flat spectrum with no dominant peaks, consistent with independent observations.

Distribution Comparison

Weibull Probability Plot

The Weibull probability plot tests the fit of the 3-parameter Weibull model (location=181, shape=3.43, scale=1357) by subtracting the location parameter from the data before plotting. If the shifted data follow a 2-parameter Weibull distribution, the points should fall near a straight line.

Weibull probability plot of location-shifted fatigue life data (x - 181). The 3-parameter Weibull model with location=181, shape=3.43, scale=1357 uses the NIST MLE estimates. Points near the reference line indicate a reasonable Weibull fit.

The points follow the reference line reasonably well, with some deviation in the upper tail. The Weibull model captures the general shape of the distribution but the AIC/BIC analysis (below) shows it does not outperform the simpler Gaussian model for this dataset.

Gamma Probability Plot

The gamma probability plot compares sorted data values against theoretical quantiles from a Gamma distribution with shape $\alpha = 11.85$ and scale $\beta = 118.2$ (NIST MLE estimates, rate parameterization converted: $\beta = 1/0.00846$ ).

Gamma probability plot of fatigue life data with shape=11.85, scale=118.2 (NIST MLE estimates). Points near the reference line indicate a good gamma fit.

The gamma probability plot shows a good fit with points closely following the reference line across the full range of the data. The gamma distribution provides a reasonable model for this right-skewed dataset, though the formal AIC/BIC comparison determines whether it outperforms the Gaussian model.

Four candidate distributions are compared by overlaying their fitted probability densities on a non-parametric kernel density estimate of the data:

Normal (Gaussian): mean 1401, standard deviation 389
Gamma: shape 11.85, rate 0.00846
Birnbaum-Saunders: shape 0.310, scale 1337
3-parameter Weibull: location 181, shape 3.43, scale 1357

All four fitted densities approximate the non-parametric estimate reasonably well, though the comparison is limited by the sample size of 101.

Quantitative Output and Interpretation

Summary Statistics

Statistic	Value
Mean	1401
Std Dev	389
Median	1340
Min	370
Max	2440
Range	2070
n	101

The mean exceeds the median, consistent with the right-skewed shape visible in the histogram. All values are positive, representing thousands of cycles to failure for the aluminum coupons.

Location Test

The location test fits a linear regression of the response $Y$ against the run-order index $X = 1, 2, \ldots, N$ and tests whether the slope is significantly different from zero.

H_0\!: B_1 = 0 \quad \text{vs.} \quad H_a\!: B_1 \neq 0

Statistic	Value
Test statistic $t$	2.5628
Critical value $t_{0.975,\,99}$	1.984
Significance level $\alpha$	0.05

Conclusion: The test statistic $t = 2.563$ slightly exceeds the critical value of 1.984, formally rejecting the null hypothesis of constant location. However, the run sequence plot shows no visually apparent trend or shift. The marginal rejection likely reflects the influence of a few extreme values in the right tail on the regression slope rather than a genuine systematic drift. Given the primary focus on distribution fitting rather than process monitoring, the location is treated as approximately fixed for practical purposes.

Variation Test

Bartlett’s test divides the data into $k = 4$ equal-length intervals and tests whether their variances are homogeneous.

H_0\!: \sigma_1^2 = \sigma_2^2 = \sigma_3^2 = \sigma_4^2 \quad \text{vs.} \quad H_a\!: \text{at least one } \sigma_i^2 \text{ differs}

Statistic	Value
Test statistic $T$	0.9490
Degrees of freedom	$k - 1 = 3$
Critical value $\chi^2_{0.95,\,3}$	7.775

Conclusion: The test statistic of 0.949 is well below the critical value of 7.775, so we fail to reject $H_0$ — the variances are not significantly different across the four quarters. The constant-variation assumption is satisfied.

Randomness Tests

Two complementary tests assess whether the observations are independent.

Runs test — tests whether the sequence of values above and below the median was produced randomly.

H_0\!: \text{sequence is random} \quad \text{vs.} \quad H_a\!: \text{sequence is not random}

Statistic	Value
Test statistic $Z$	-3.4995
Critical value $Z_{1-\alpha/2}$	1.96

Conclusion: $|Z| = 3.50$ exceeds 1.96, so the runs test rejects $H_0$ . The negative Z indicates fewer runs than expected, suggesting some clustering of values above and below the median.

Lag-1 autocorrelation — measures the linear dependence between consecutive observations.

Statistic	Value
$r_1$	0.108
Critical value $2/\sqrt{N}$	0.199

Conclusion: The lag-1 autocorrelation of 0.108 is within the critical bounds of $\pm 0.199$ , and the autocorrelation plot confirms all coefficients fall within the 95% confidence bands. The runs test rejection with non-significant lag-1 autocorrelation suggests the clustering is mild — driven by the skewed distribution rather than genuine temporal dependence. The randomness assumption is approximately satisfied given the consistent graphical evidence.

Distribution Tests

The probability plot correlation coefficient (PPCC) and Anderson-Darling test assess whether the data follow a normal distribution.

Test	Statistic	Critical Value	Result
PPCC (normal)	0.997	0.987	Fail to reject
Anderson-Darling	$A^2 = 0.2068$	0.787	Fail to reject

Conclusion: Both tests fail to reject normality. The PPCC of 0.997 exceeds the critical value of 0.987, and the Anderson-Darling statistic of $A^2 = 0.207$ is well below the critical value of 0.787. Despite the visual right-skewness in the histogram, the formal tests do not reject the Gaussian model at the 5% level. This result is consistent with the AIC/BIC model comparison below, which also favors the Gaussian.

Outlier Detection

Grubbs’ test tests whether the most extreme observation is a statistical outlier.

Statistic	Value
Test statistic $G$	2.6553
Critical value ( $n = 101$ , $\alpha = 0.05$ )	3.388

Conclusion: $G = 2.655$ is below the critical value of 3.388, so we fail to reject — no significant outliers are detected at the 5% level. The extreme value of 2,440 thousand cycles, flagged as a potential outlier in the box plot, does not reach statistical significance.

Test Summary

Assumption	Test	Statistic	Critical Value	Result
Fixed location	Regression on run order	$t = 2.563$	1.984	Marginal reject
Fixed variation	Bartlett’s test	$T = 0.949$	7.775	Fail to reject
Randomness	Runs test	$Z = {-3.500}$	1.96	Reject
Randomness	Autocorrelation lag-1	$r_1 = 0.108$	0.199	Fail to reject
Normality	Anderson-Darling	$A^2 = 0.207$	0.787	Fail to reject
Normality	PPCC	$r = 0.997$	0.987	Fail to reject
Outliers	Grubbs’ test	$G = 2.655$	3.388	Fail to reject

The variation, normality, and outlier assumptions are clearly satisfied. The location test marginally rejects and the runs test rejects, though the lag-1 autocorrelation does not. The graphical evidence from the run sequence and autocorrelation plots supports approximate stability and independence. As this case study focuses primarily on distribution identification rather than process control, the distributional assumption — supported by both the Anderson-Darling and PPCC tests — is the central quantitative finding and motivates the AIC/BIC model comparison below.

Model Selection

Model comparison using Akaike’s Information Criterion (AIC) and Bayesian Information Criterion (BIC) yields:

Model	AIC	BIC
Gaussian	1495	1501
Weibull (3-param)	1498	1505
Gamma	1499	1504
Birnbaum-Saunders	1507	1512

The Gaussian model has the lowest AIC and BIC. Converting to posterior probabilities (assuming equal prior probabilities), the Gaussian model has 76% probability, Gamma 16%, Weibull 7.4%, and Birnbaum-Saunders 0.27%.

Prediction Intervals

Using the selected Gaussian model, the 0.1st percentile of the fitted distribution (the value exceeded with probability 99.9%) is approximately 198 thousand cycles. Bootstrap analysis based on 5,000 resamples yields a 95% confidence interval for this percentile ranging from 40 to 366 thousand cycles, reflecting the substantial uncertainty inherent in estimating extreme percentiles from 101 observations.

Interpretation

The run sequence plot shows 101 observations fluctuating around a stable mean with no visually apparent trend, the lag plot displays a structureless scatter cloud, and the autocorrelation plot confirms all coefficients fall within the 95% confidence bands. The quantitative tests present a mixed picture: Bartlett’s test confirms constant variation ( $T = 0.949$ , well below the critical value of 7.775), and the lag-1 autocorrelation of $r_1 = 0.108$ is within the $\pm 0.199$ bounds. However, the location regression test marginally rejects ( $t = 2.563$ vs. a critical value of 1.984), likely driven by right-tail extreme values influencing the regression slope rather than genuine systematic drift. The runs test also rejects randomness ( $Z = {-3.500}$ , exceeding the 1.96 threshold), indicating fewer runs than expected — though the non-significant lag-1 autocorrelation suggests the clustering reflects the skewed distribution rather than temporal dependence. The fixed-variation assumption is clearly satisfied, while the fixed-location and randomness assumptions hold approximately given the graphical evidence. The distributional assumption is the central analytical challenge for this dataset.

The histogram displays a right-skewed shape with most observations between 800 and 1,600 thousand cycles and a long upper tail extending past 2,000, suggesting that the normal distribution may be inadequate. Three probability plots compare candidate models: the normal probability plot shows moderate tail deviations, the Weibull probability plot provides a reasonable fit with some upper-tail departure, and the gamma probability plot tracks the reference line closely across the full data range. Despite this visual evidence favoring skewed alternatives, the AIC/BIC model comparison decisively favors the Gaussian model (AIC 1495, BIC 1501) over the Gamma (AIC 1499, BIC 1504), 3-parameter Weibull (AIC 1498, BIC 1505), and Birnbaum-Saunders (AIC 1507, BIC 1512). Converting to posterior probabilities with equal priors, the Gaussian receives 76%, Gamma 16%, Weibull 7.4%, and Birnbaum-Saunders 0.27%. The Anderson-Darling test ( $A^2 = 0.207$ , critical value 0.787) and PPCC ( $r = 0.997$ , critical value 0.987) both fail to reject normality, providing independent confirmation. The counterintuitive finding is that visual right-skewness does not translate to a better fit from skewed distributions when evaluated by formal information criteria.

The selected Gaussian model yields a 0.1st percentile (B0.1 life) estimate of 198 thousand cycles — the fatigue life exceeded with 99.9% probability under the fitted model. Bootstrap analysis with 5,000 resamples produces a 95% confidence interval of (40, 366) thousand cycles, and the width of this interval reflects the substantial uncertainty inherent in estimating extreme-tail percentiles from 101 observations. The pedagogical lesson of this case study is definitive: visual impressions of distributional shape — however compelling — must be confirmed by formal model selection criteria such as AIC, BIC, Anderson-Darling, and PPCC before choosing an alternative to the normal distribution.

Conclusions

The quantitative test battery confirms that process variation is constant (Bartlett’s test $T = 0.949$ ), while the location test marginally rejects ( $t = 2.563$ ) and the runs test flags non-randomness ( $Z = {-3.500}$ ). Both marginal violations are attributable to the influence of the right-skewed distribution on the test statistics rather than genuine process instability. The core finding is that AIC/BIC model comparison favors the normal distribution (posterior probability 76%) over the gamma (16%), Weibull (7.4%), and Birnbaum-Saunders (0.27%) alternatives, and the Anderson-Darling and PPCC tests independently fail to reject normality. The 0.1st percentile (B0.1 life) under the Gaussian model is 198 thousand cycles with a bootstrap 95% CI of (40, 366) thousand cycles — the wide interval underscoring the uncertainty in extreme-tail estimation from moderate sample sizes. This case study demonstrates that visual impressions of non-normality must be verified by formal model selection criteria, and that comparing multiple candidate distributions is essential before assuming a particular distributional form based on application domain conventions.