Fatigue Life of Aluminum Alloy Specimens
NIST/SEMATECH Section 1.4.2.9 Fatigue Life of Aluminum Alloy Specimens
Background and Data
This case study applies exploratory data analysis to 101 measurements of fatigue life (thousands of cycles until rupture) of rectangular strips of 6061-T6 aluminum sheeting, subjected to periodic loading with maximum stress of 21,000 psi, as reported by Birnbaum and Saunders (1958). The dataset is referred to as BIRNSAUN.DAT in the NIST archive.
The dataset originates from NIST/SEMATECH Section 1.4.2.9. Unlike the other case studies in this collection, the fatigue life study focuses specifically on probabilistic model selection: determining which probability distribution best describes the dispersion of measured lifetimes for reliability prediction and warranty estimation.
Dataset
Birnbaum & Saunders (1958), aluminum fatigue life
NIST source description
Source: Birnbaum, Z. W. and Saunders, S. C. (1958), "A Statistical Model for Life-Length of Materials", Journal of the American Statistical Association, 53(281), pp. 151–160. Response variable = fatigue life (thousands of cycles until rupture) of rectangular strips of 6061-T6 aluminum sheeting, subjected to periodic loading with maximum stress of 21,000 psi. Number of observations = 101.
Preview data
| # | Value |
|---|---|
| 1 | 370 |
| 2 | 1016 |
| 3 | 1235 |
| 4 | 1419 |
| 5 | 1567 |
| 6 | 1820 |
| 7 | 706 |
| 8 | 1018 |
| 9 | 1238 |
| 10 | 1420 |
| ... 91 more rows | |
Test Underlying Assumptions
Goals
The goals of this analysis are:
- Determine if the fatigue life data can be adequately modeled with a known probability distribution for reliability prediction
- Assess the fixed-location assumption: is the process mean stable over the run?
- Assess the fixed-variation assumption: is the process variability constant?
- Assess the randomness assumption: are the observations independent?
- Determine which probability distribution (Normal, Gamma, Weibull, or Birnbaum-Saunders) best describes the fatigue life dispersion
Graphical Output and Interpretation
4-Plot Overview
The 4-plot provides a comprehensive view of the fatigue life data. The run sequence plot should show stable location and variation, the lag plot should show no dependence, and the histogram reveals the right-skewed distribution.
The assumptions are addressed by the four diagnostic plots:
- The run sequence plot (upper left) shows 101 observations fluctuating around a stable mean with no systematic drift or shifts — the fixed-location and fixed-variation assumptions appear satisfied.
- The lag plot (upper right) shows a structureless scatter cloud, consistent with independent observations — the randomness assumption appears satisfied.
- The histogram (lower left) is right-skewed, with most observations clustering between 800 and 1,600 thousand cycles and a long upper tail extending past 2,000 — the distribution is not symmetric and not bell-shaped.
- The normal probability plot (lower right) shows deviations from the theoretical straight line, particularly in the tails — the normality assumption is questionable.
Three of four assumptions hold: fixed location, fixed variation, and randomness. The distributional assumption requires further investigation — the right-skewed shape suggests that alternative distributions (Gamma, Weibull, Birnbaum-Saunders) may provide a better fit than the normal.
Box Plot
The initial graphical analysis uses dot charts, box plots, histograms, and kernel density estimates to characterize the distribution. The box plot suggests the largest measured value may be an outlier.
Run Sequence Plot
The run sequence plot shows 101 observations fluctuating around the mean of approximately 1401 thousand cycles. The location and variation appear stable across the run, with no systematic drift or shifts.
Lag Plot
The lag plot at lag 1 shows a structureless scatter cloud, consistent with independent observations. The slight elongation in the scatter reflects the skewed distribution rather than serial dependence.
Histogram
The data range from slightly above 350 to slightly below 2,500 thousand cycles. The histogram is right-skewed, with most observations clustering between 800 and 1,600 thousand cycles and a long upper tail extending past 2,000.
Normal Probability Plot
The normal probability plot (Q-Q plot) shows the data deviating from the theoretical straight line, particularly in the tails. When compared against an envelope of 99 simulated Gaussian samples of the same size, the data Q-Q plot falls within the envelope, suggesting the Gaussian model cannot be firmly rejected on graphical grounds alone.
Autocorrelation Plot
The autocorrelation plot confirms that the fatigue life measurements are independent. All autocorrelation coefficients fall within the 95% confidence bands.
Spectral Plot
The spectral plot shows a flat spectrum with no dominant peaks, consistent with independent observations.
Distribution Comparison
Weibull Probability Plot
The Weibull probability plot tests the fit of the 3-parameter Weibull model (location=181, shape=3.43, scale=1357) by subtracting the location parameter from the data before plotting. If the shifted data follow a 2-parameter Weibull distribution, the points should fall near a straight line.
The points follow the reference line reasonably well, with some deviation in the upper tail. The Weibull model captures the general shape of the distribution but the AIC/BIC analysis (below) shows it does not outperform the simpler Gaussian model for this dataset.
Gamma Probability Plot
The gamma probability plot compares sorted data values against theoretical quantiles from a Gamma distribution with shape and scale (NIST MLE estimates, rate parameterization converted: ).
The gamma probability plot shows a good fit with points closely following the reference line across the full range of the data. The gamma distribution provides a reasonable model for this right-skewed dataset, though the formal AIC/BIC comparison determines whether it outperforms the Gaussian model.
Four candidate distributions are compared by overlaying their fitted probability densities on a non-parametric kernel density estimate of the data:
- Normal (Gaussian): mean 1401, standard deviation 389
- Gamma: shape 11.85, rate 0.00846
- Birnbaum-Saunders: shape 0.310, scale 1337
- 3-parameter Weibull: location 181, shape 3.43, scale 1357
All four fitted densities approximate the non-parametric estimate reasonably well, though the comparison is limited by the sample size of 101.
Quantitative Output and Interpretation
Summary Statistics
| Statistic | Value |
|---|---|
| Mean | 1401 |
| Std Dev | 389 |
| Median | 1340 |
| Min | 370 |
| Max | 2440 |
| Range | 2070 |
| n | 101 |
The mean exceeds the median, consistent with the right-skewed shape visible in the histogram. All values are positive, representing thousands of cycles to failure for the aluminum coupons.
Location Test
The location test fits a linear regression of the response against the run-order index and tests whether the slope is significantly different from zero.
| Statistic | Value |
|---|---|
| Test statistic | 2.5628 |
| Critical value | 1.984 |
| Significance level | 0.05 |
Conclusion: The test statistic slightly exceeds the critical value of 1.984, formally rejecting the null hypothesis of constant location. However, the run sequence plot shows no visually apparent trend or shift. The marginal rejection likely reflects the influence of a few extreme values in the right tail on the regression slope rather than a genuine systematic drift. Given the primary focus on distribution fitting rather than process monitoring, the location is treated as approximately fixed for practical purposes.
Variation Test
Bartlett’s test divides the data into equal-length intervals and tests whether their variances are homogeneous.
| Statistic | Value |
|---|---|
| Test statistic | 0.9490 |
| Degrees of freedom | |
| Critical value | 7.775 |
Conclusion: The test statistic of 0.949 is well below the critical value of 7.775, so we fail to reject — the variances are not significantly different across the four quarters. The constant-variation assumption is satisfied.
Randomness Tests
Two complementary tests assess whether the observations are independent.
Runs test — tests whether the sequence of values above and below the median was produced randomly.
| Statistic | Value |
|---|---|
| Test statistic | -3.4995 |
| Critical value | 1.96 |
Conclusion: exceeds 1.96, so the runs test rejects . The negative Z indicates fewer runs than expected, suggesting some clustering of values above and below the median.
Lag-1 autocorrelation — measures the linear dependence between consecutive observations.
| Statistic | Value |
|---|---|
| 0.108 | |
| Critical value | 0.199 |
Conclusion: The lag-1 autocorrelation of 0.108 is within the critical bounds of , and the autocorrelation plot confirms all coefficients fall within the 95% confidence bands. The runs test rejection with non-significant lag-1 autocorrelation suggests the clustering is mild — driven by the skewed distribution rather than genuine temporal dependence. The randomness assumption is approximately satisfied given the consistent graphical evidence.
Distribution Tests
The probability plot correlation coefficient (PPCC) and Anderson-Darling test assess whether the data follow a normal distribution.
| Test | Statistic | Critical Value | Result |
|---|---|---|---|
| PPCC (normal) | 0.997 | 0.987 | Fail to reject |
| Anderson-Darling | 0.787 | Fail to reject |
Conclusion: Both tests fail to reject normality. The PPCC of 0.997 exceeds the critical value of 0.987, and the Anderson-Darling statistic of is well below the critical value of 0.787. Despite the visual right-skewness in the histogram, the formal tests do not reject the Gaussian model at the 5% level. This result is consistent with the AIC/BIC model comparison below, which also favors the Gaussian.
Outlier Detection
Grubbs’ test tests whether the most extreme observation is a statistical outlier.
| Statistic | Value |
|---|---|
| Test statistic | 2.6553 |
| Critical value (, ) | 3.388 |
Conclusion: is below the critical value of 3.388, so we fail to reject — no significant outliers are detected at the 5% level. The extreme value of 2,440 thousand cycles, flagged as a potential outlier in the box plot, does not reach statistical significance.
Test Summary
| Assumption | Test | Statistic | Critical Value | Result |
|---|---|---|---|---|
| Fixed location | Regression on run order | 1.984 | Marginal reject | |
| Fixed variation | Bartlett’s test | 7.775 | Fail to reject | |
| Randomness | Runs test | 1.96 | Reject | |
| Randomness | Autocorrelation lag-1 | 0.199 | Fail to reject | |
| Normality | Anderson-Darling | 0.787 | Fail to reject | |
| Normality | PPCC | 0.987 | Fail to reject | |
| Outliers | Grubbs’ test | 3.388 | Fail to reject |
The variation, normality, and outlier assumptions are clearly satisfied. The location test marginally rejects and the runs test rejects, though the lag-1 autocorrelation does not. The graphical evidence from the run sequence and autocorrelation plots supports approximate stability and independence. As this case study focuses primarily on distribution identification rather than process control, the distributional assumption — supported by both the Anderson-Darling and PPCC tests — is the central quantitative finding and motivates the AIC/BIC model comparison below.
Model Selection
Model comparison using Akaike’s Information Criterion (AIC) and Bayesian Information Criterion (BIC) yields:
| Model | AIC | BIC |
|---|---|---|
| Gaussian | 1495 | 1501 |
| Weibull (3-param) | 1498 | 1505 |
| Gamma | 1499 | 1504 |
| Birnbaum-Saunders | 1507 | 1512 |
The Gaussian model has the lowest AIC and BIC. Converting to posterior probabilities (assuming equal prior probabilities), the Gaussian model has 76% probability, Gamma 16%, Weibull 7.4%, and Birnbaum-Saunders 0.27%.
Prediction Intervals
Using the selected Gaussian model, the 0.1st percentile of the fitted distribution (the value exceeded with probability 99.9%) is approximately 198 thousand cycles. Bootstrap analysis based on 5,000 resamples yields a 95% confidence interval for this percentile ranging from 40 to 366 thousand cycles, reflecting the substantial uncertainty inherent in estimating extreme percentiles from 101 observations.
Interpretation
The run sequence plot shows 101 observations fluctuating around a stable mean with no visually apparent trend, the lag plot displays a structureless scatter cloud, and the autocorrelation plot confirms all coefficients fall within the 95% confidence bands. The quantitative tests present a mixed picture: Bartlett’s test confirms constant variation (, well below the critical value of 7.775), and the lag-1 autocorrelation of is within the bounds. However, the location regression test marginally rejects ( vs. a critical value of 1.984), likely driven by right-tail extreme values influencing the regression slope rather than genuine systematic drift. The runs test also rejects randomness (, exceeding the 1.96 threshold), indicating fewer runs than expected — though the non-significant lag-1 autocorrelation suggests the clustering reflects the skewed distribution rather than temporal dependence. The fixed-variation assumption is clearly satisfied, while the fixed-location and randomness assumptions hold approximately given the graphical evidence. The distributional assumption is the central analytical challenge for this dataset.
The histogram displays a right-skewed shape with most observations between 800 and 1,600 thousand cycles and a long upper tail extending past 2,000, suggesting that the normal distribution may be inadequate. Three probability plots compare candidate models: the normal probability plot shows moderate tail deviations, the Weibull probability plot provides a reasonable fit with some upper-tail departure, and the gamma probability plot tracks the reference line closely across the full data range. Despite this visual evidence favoring skewed alternatives, the AIC/BIC model comparison decisively favors the Gaussian model (AIC 1495, BIC 1501) over the Gamma (AIC 1499, BIC 1504), 3-parameter Weibull (AIC 1498, BIC 1505), and Birnbaum-Saunders (AIC 1507, BIC 1512). Converting to posterior probabilities with equal priors, the Gaussian receives 76%, Gamma 16%, Weibull 7.4%, and Birnbaum-Saunders 0.27%. The Anderson-Darling test (, critical value 0.787) and PPCC (, critical value 0.987) both fail to reject normality, providing independent confirmation. The counterintuitive finding is that visual right-skewness does not translate to a better fit from skewed distributions when evaluated by formal information criteria.
The selected Gaussian model yields a 0.1st percentile (B0.1 life) estimate of 198 thousand cycles — the fatigue life exceeded with 99.9% probability under the fitted model. Bootstrap analysis with 5,000 resamples produces a 95% confidence interval of (40, 366) thousand cycles, and the width of this interval reflects the substantial uncertainty inherent in estimating extreme-tail percentiles from 101 observations. The pedagogical lesson of this case study is definitive: visual impressions of distributional shape — however compelling — must be confirmed by formal model selection criteria such as AIC, BIC, Anderson-Darling, and PPCC before choosing an alternative to the normal distribution.
Conclusions
The quantitative test battery confirms that process variation is constant (Bartlett’s test ), while the location test marginally rejects () and the runs test flags non-randomness (). Both marginal violations are attributable to the influence of the right-skewed distribution on the test statistics rather than genuine process instability. The core finding is that AIC/BIC model comparison favors the normal distribution (posterior probability 76%) over the gamma (16%), Weibull (7.4%), and Birnbaum-Saunders (0.27%) alternatives, and the Anderson-Darling and PPCC tests independently fail to reject normality. The 0.1st percentile (B0.1 life) under the Gaussian model is 198 thousand cycles with a bootstrap 95% CI of (40, 366) thousand cycles — the wide interval underscoring the uncertainty in extreme-tail estimation from moderate sample sizes. This case study demonstrates that visual impressions of non-normality must be verified by formal model selection criteria, and that comparing multiple candidate distributions is essential before assuming a particular distributional form based on application domain conventions.