Skip to main content

Ceramic Strength Case Study

NIST/SEMATECH Section 1.4.2.10 Ceramic Strength

Background and Data

This case study applies exploratory data analysis to the NIST JAHANMI2.DAT dataset, which contains flexural strength measurements of bonded silicon nitride ceramic specimens. The data come from a designed experiment investigating the effects of three primary machining factors (table speed, down feed rate, and wheel grit size) and two nuisance factors (lab and batch) on ceramic strength. The data were collected by Said Jahanmir of the NIST Ceramics Division in 1996.

The full dataset contains 960 observations across 8 labs, 2 batches, 2 replications, and 16 treatment combinations. For this case study, only the longitudinal direction data are used, resulting in n=480n = 480 observations with three primary factors (table speed, down feed rate, wheel grit size) each at 2 levels. The dataset originates from NIST/SEMATECH Section 1.4.2.10.

Study Design

This is a multi-factor designed experiment with the following structure:

  • Response variable YY — flexural strength of bonded silicon nitride (ceramic)
  • Primary factors — table speed (X1X_1), down feed rate (X2X_2), wheel grit size (X3X_3), each at 2 levels
  • Nuisance factors — lab (8 levels) and batch (2 levels)
  • Replications — 2 per treatment combination

The goals of the analysis are to:

  1. Determine which of the primary factors has the strongest effect on ceramic strength
  2. Estimate the magnitude of the effects
  3. Determine the optimal settings for the primary factors
  4. Determine if the nuisance factors (lab and batch) have an effect on ceramic strength

The general ANOVA model for this designed experiment includes main effects, interactions, and blocking factors:

Yijklm=μ+Bi+Lj+τk+(ττ)kl+εijklmY_{ijklm} = \mu + B_i + L_j + \tau_k + (\tau\tau)_{kl} + \varepsilon_{ijklm}

where μ\mu is the grand mean, BiB_i is the batch effect, LjL_j is the lab effect, τk\tau_k represents the primary factor main effects, (ττ)kl(\tau\tau)_{kl} represents the interaction effects, and εijklm\varepsilon_{ijklm} is the random error.

Dataset

JAHANMI2.DAT
Observations: 480
Variables: 8 variables

Said Jahanmir, NIST Ceramics Division, ceramic flexural strength (1996)

NIST source description
Effect of Machining Factors on Strength of Ceramics (longitudinal data only). Response variable = ceramic strength (MPa). 15 variables per observation: Observation ID, Lab (8 levels), Bar ID, Set, Strength (Y), Table Speed (2 levels), Down Feed Rate (2 levels), Wheel Grit (2 levels), Direction, Treatment (16 levels), Set of 15, Rep (2 levels), Coded Lab, Bar Batch (2 levels), Distinct set of 15 reps. Number of observations = 480.
Preview data
ID Lab Strength (MPa) Table Speed Down Feed Wheel Grit Batch Rep
1 1 608.781 -1 -1 -1 1 1
2 1 569.67 -1 -1 -1 2 1
3 1 689.556 -1 -1 -1 1 1
4 1 747.541 -1 -1 -1 2 1
5 1 618.134 -1 -1 -1 1 1
6 1 612.182 -1 -1 -1 2 1
7 1 680.203 -1 -1 -1 1 1
8 1 607.766 -1 -1 -1 2 1
9 1 726.232 -1 -1 -1 1 1
10 1 605.38 -1 -1 -1 2 1
... 470 more rows

Response Variable Analysis

4-Plot Overview

The 4-plot of the response variable reveals important features of the pooled data:

  • Run sequence plot (upper left) — the location and scale are relatively constant, though about half a dozen points in the 300 to 450 range may require attention as potential outliers. Most points cluster between 500 and 750. The run sequence plot can reveal time effects, which are typically undesirable nuisance factors in designed experiments.
  • Lag plot (upper right) — does not show significant structure, indicating no strong temporal dependence in the run order
  • Histogram (lower left) — appears reasonably symmetric but with a bimodal distribution, showing two distinct peaks rather than a single centered peak
  • Normal probability plot (lower right) — shows some curvature indicating that distributions other than the normal may provide a better fit

The bimodal histogram is the most important finding from the 4-plot — it immediately suggests that a major grouping factor (batch, lab, or treatment) is creating two distinct sub-populations.

Four-plot diagnostic layout for pooled ceramic strength data (run sequence, lag, histogram, normal probability).

The assumptions are addressed by the four diagnostic plots:

  1. The run sequence plot (upper left) shows location and scale are relatively constant, though about half a dozen points in the 300-450 range may require attention as potential outliers.
  2. The lag plot (upper right) does not show significant temporal structure, indicating no strong time-based dependence in the run order.
  3. The histogram (lower left) reveals a bimodal distribution — two distinct peaks rather than a single centered peak. This is the most important finding and immediately suggests a major grouping factor (batch, lab, or treatment) is creating two distinct sub-populations.
  4. The normal probability plot (lower right) shows curvature indicating that the pooled data are not well-described by a single normal distribution — consistent with the bimodal structure in the histogram.

The bimodal histogram is the dominant finding. Before proceeding with standard assumption tests, the source of the bimodality must be identified and addressed.

Run Sequence Plot

The run sequence plot of the pooled response shows location and scale are relatively constant over time, though the spread of the data reflects both within-treatment variability and the substantial batch effect.

Run sequence plot of pooled flexural strength showing relatively constant location and scale across 480 observations.

Lag Plot

The lag plot does not show significant temporal structure, indicating no strong time-based dependence in the measurement order.

Lag-1 plot of pooled ceramic strength data showing no significant temporal dependence in measurement order.

Histogram

The histogram of the pooled response reveals the bimodal distribution — two distinct peaks corresponding to the two batches. This is the most important finding from the initial graphical analysis.

Histogram with KDE overlay of pooled ceramic strength data revealing a bimodal distribution from the batch effect.

Normal Probability Plot

The normal probability plot shows curvature — specifically an S-shape consistent with a bimodal mixture distribution from pooling two batch sub-populations with different means.

Normal probability plot of pooled ceramic strength data. The S-shaped curvature reflects the bimodal mixture of two batch sub-populations.

Batch Effect Analysis

The batch effect is the dominant finding in this dataset. This section examines the batch difference using multiple graphical and statistical approaches.

Bihistogram

The bihistogram compares the distributions of Batch 1 and Batch 2 on a shared x-axis. Batch 1 responses are centered at approximately 689, while Batch 2 responses are centered at approximately 611, a difference of approximately 78 units. This batch effect was completely unexpected by the scientific investigators. The variability is comparable for both batches, though Batch 1 exhibits lower-tail skewness while Batch 2 shows central skewness. Both batches contain low-lying outlier points.

Bihistogram comparing Batch 1 (top) and Batch 2 (bottom) flexural strength distributions on a shared x-axis. The clear separation confirms the approximately 78-unit batch effect.

Batch Box Plot

Box plots by batch confirm the location difference and show multiple outliers on the low side for both batches, with Batch 2 also showing high-side outliers.

Box plot comparing Batch 1 and Batch 2 flexural strength. Batch 1 is centered approximately 78 units higher, with both batches showing low-side outliers.

Batch Q-Q Plot

The quantile-quantile plot compares Batch 1 quantiles directly against Batch 2 quantiles. Except for a few points in the right tail, Batch 1 values have consistently higher quantiles than Batch 2, confirming the location difference. The Q-Q plot is not linear — this implies that the difference between the batches is not explained simply by a shift in location. The variation and skewness differ between batches as well, consistent with the shape differences visible in the bihistogram.

Quantile-quantile plot comparing Batch 1 and Batch 2 quantiles. Batch 1 quantiles consistently exceed Batch 2, and the non-linear pattern indicates the batch difference involves more than a simple location shift.

Batch Block Plots

Block plots showing batch means by lab confirm that the batch effect is consistent across all 8 laboratories. In every lab, Batch 1 means exceed Batch 2 means. The parallel nature of the two lines indicates that the batch effect is additive — approximately the same magnitude regardless of which lab performed the measurements.

Block plot of batch means by lab showing Batch 1 consistently exceeds Batch 2 across all 8 labs, confirming the batch effect is not lab-specific.

The additional block plots below confirm that the batch effect holds across all combinations of labs and primary factor levels. In every case, Batch 1 exceeds Batch 2 — the batch effect is robust over table speed, down feed rate, and wheel grit size.

Block plot of batch means by lab and table speed level. Batch 1 exceeds Batch 2 in all 16 combinations, confirming the batch effect is robust over table speed.
Block plot of batch means by lab and down feed level. Batch 1 exceeds Batch 2 in all 16 combinations, confirming the batch effect is robust over down feed rate.
Block plot of batch means by lab and wheel grit level. Batch 1 exceeds Batch 2 in all 16 combinations, confirming the batch effect is robust over wheel grit size.

This consistency across all labs and all primary factor combinations strengthens the conclusion that the batch difference reflects a genuine material property difference rather than a lab-specific or factor-dependent artifact.

Batch Statistical Tests

The batch comparison is the central quantitative finding of this case study.

StatisticBatch 1Batch 2
nn240240
Mean Yˉ\bar{Y}688.9987611.1559
Std Dev ss65.549161.8543
Variance s2s^24296.68453825.9544

F-Test for Equal Variances

Before comparing means, we first test whether the two batches have equal variances using the F-test.

H0 ⁣:σ12=σ22vs.Ha ⁣:σ12σ22H_0\!: \sigma_1^2 = \sigma_2^2 \quad \text{vs.} \quad H_a\!: \sigma_1^2 \neq \sigma_2^2

The F-test statistic is:

F=s12s22=4296.68453825.9544=1.123F = \frac{s_1^2}{s_2^2} = \frac{4296.6845}{3825.9544} = 1.123
ParameterValue
Test statistic FF1.123
Numerator df ν1\nu_1239
Denominator df ν2\nu_2239
Significance level α\alpha0.05
Acceptance region(0.845, 1.289)

Conclusion: F=1.123F = 1.123 falls within the acceptance region (0.845,  1.289)(0.845,\; 1.289), so we fail to reject H0H_0 — the batch variances are not significantly different at the 5% level. This justifies using a pooled variance in the subsequent t-test.

Two-Sample t-Test for Equal Means

With equal variances established, the two-sample t-test tests whether the batch means are significantly different.

H0 ⁣:μ1=μ2vs.Ha ⁣:μ1μ2H_0\!: \mu_1 = \mu_2 \quad \text{vs.} \quad H_a\!: \mu_1 \neq \mu_2

The pooled standard deviation and test statistic are:

sp=63.7285,T=Yˉ1Yˉ2sp1n1+1n2=13.3806s_p = 63.7285, \qquad T = \frac{\bar{Y}_1 - \bar{Y}_2}{s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} = 13.3806
ParameterValue
Test statistic TT13.3806
Pooled std dev sps_p63.7285
Degrees of freedom ν\nu478
Critical value t0.975,478t_{0.975,\,478}1.965

Conclusion: T=13.38T = 13.38 vastly exceeds the critical value of 1.965, so we reject H0H_0 — the batch means are highly significantly different. Batch 1 is on average 689.00611.16=77.84689.00 - 611.16 = 77.84 units stronger than Batch 2, and this difference is consistent across all labs and all primary factor levels.

Lab Effect Analysis

This section examines whether the 8 laboratories that performed the measurements have systematically different results.

Lab Box Plot

Box plots by lab show minor variation in medians across the 8 labs, with relatively constant scales. Two labs (3 and 5) show outliers on the low side. The overall pattern suggests that lab-to-lab differences are small compared to the batch effect.

Box plots of flexural strength by lab (pooled data). Medians vary slightly across labs with relatively constant scales. Labs 3 and 5 show low-side outliers.

Lab Box Plot by Batch

When analyzed separately by batch, the lab-to-lab variation becomes clearer:

  • Batch 1 — median strength ranges from 650 to 700 across labs; variability is relatively constant; all labs contain at least one low-side outlier
  • Batch 2 — median strength ranges from 550 to 600 across labs; there is somewhat more variability across labs compared to Batch 1; six labs show high-side outliers and three show low-side outliers
Box plots by lab for Batch 1. Median strength ranges from 650 to 700 across labs with relatively constant variability.
Box plots by lab for Batch 2. Median strength ranges from 550 to 600 with somewhat more cross-lab variability than Batch 1.

The batch effect of approximately 75 to 100 units on location dominates any lab effects within either batch.

Lab Statistical Tests

To formally test whether the labs differ, we perform a one-way ANOVA comparing the 8 lab means.

H0 ⁣:μ1=μ2==μ8vs.Ha ⁣:at least one μi differsH_0\!: \mu_1 = \mu_2 = \cdots = \mu_8 \quad \text{vs.} \quad H_a\!: \text{at least one } \mu_i \text{ differs}
SourceSSdfMSF
Between labs70,754.64710,107.811.837
Within labs2,597,691.934725,503.58
Total2,668,446.57479
ParameterValue
Test statistic FF1.837
Between-groups df ν1\nu_17
Within-groups df ν2\nu_2472
Critical value F0.95,7,472F_{0.95,\,7,\,472}2.082
Significance level α\alpha0.05

Conclusion: F=1.837F = 1.837 is less than the critical value F0.95,7,472=2.082F_{0.95,\,7,\,472} = 2.082, so we fail to reject H0H_0 — the lab means are not significantly different at the 5% level. This confirms that the 8 laboratories are homogeneous and that any apparent lab-to-lab differences in the box plots are within the range of normal sampling variation. The labs can be treated as equivalent for purposes of analyzing the primary machining factors.

Primary Factors Analysis

The designed experiment analysis examines the effects of three primary machining factors, analyzed separately by batch because of the dominant batch effect:

FactorSymbolLevel 1-1Level +1+1
Table speedX1X_10.0250.125
Down feed rateX2X_20.0500.125
Wheel grit sizeX3X_315080

Because the batch effect dominates, factor effects are analyzed separately by batch. DOE mean plots, standard deviation plots, and interaction plots for each batch will be presented in the following subsections.

DOE Mean Plot

DOE mean plots show the mean response at each factor level. The steeper the line connecting the two levels, the larger the effect of that factor on ceramic strength. The dashed horizontal reference line marks the batch grand mean.

Batch 1

DOE mean plot for Batch 1 showing factor level means. Table speed (X1) has the steepest slope, indicating the dominant effect.

For Batch 1, table speed (X1X_1) produces the steepest slope, confirming its dominant effect of 30.77-30.77 units. The negative sign indicates that lower table speed (level 1-1) yields higher strength. Wheel grit (X3X_3) shows a moderate effect of 7.18-7.18 units, while down feed rate (X2X_2) has essentially no effect (near-horizontal line).

Batch 2

DOE mean plot for Batch 2 showing factor level means. Down feed (X2) has the steepest slope, indicating the dominant effect.

For Batch 2, the pattern is markedly different. Down feed rate (X2X_2) now produces the steepest slope with an effect of +18.22+18.22 units — the positive sign indicates that higher down feed rate yields higher strength. Wheel grit (X3X_3) is the second most important factor at 14.71-14.71 units. Table speed (X1X_1) has essentially no effect (near-horizontal line), in stark contrast to Batch 1.

DOE Standard Deviation Plot

DOE standard deviation plots show the within-level standard deviation at each factor level. Large differences between levels indicate that the factor affects not only the mean response but also its variability.

Batch 1

DOE standard deviation plot for Batch 1. Table speed shows the largest variability effect between factor levels.

For Batch 1, table speed (X1X_1) shows a substantial difference in variability between levels — the standard deviation at the high level is approximately 20 units larger than at the low level. This variability effect is important for process optimization because it means slower table speed not only produces stronger ceramics but also more consistent ones.

Batch 2

DOE standard deviation plot for Batch 2. Variability differences are roughly comparable across all three factors.

For Batch 2, the standard deviation differences across factor levels are roughly comparable for all three factors, with no single factor dominating the variability structure. This contrasts with Batch 1 where table speed had a pronounced variability effect.

Interaction Effects

Interaction plots show the mean response at each level of one factor, with separate lines for each level of a second factor. Non-parallel lines indicate an interaction — the effect of one factor depends on the level of the other.

The X1×X3X_1 \times X_3 (table speed by wheel grit) interaction is the most important interaction in both batches, ranked second in the effect tables for both Batch 1 (20.25-20.25) and Batch 2 (16.71-16.71).

Batch 1: Table Speed x Wheel Grit

Interaction plot for X1 (table speed) by X3 (wheel grit) in Batch 1. Non-parallel lines indicate a substantial interaction effect of -20.25 units.

The non-parallel lines confirm the X1×X3X_1 \times X_3 interaction in Batch 1. At low table speed, the wheel grit effect is modest, but at high table speed, the wheel grit effect becomes much larger. The interaction magnitude of 20.25-20.25 units is second only to the table speed main effect.

Batch 2: Table Speed x Wheel Grit

Interaction plot for X1 (table speed) by X3 (wheel grit) in Batch 2. Non-parallel lines indicate an interaction effect of -16.71 units.

The X1×X3X_1 \times X_3 interaction in Batch 2 has a similar pattern but with different magnitudes. This interaction effect of 16.71-16.71 units is more important in Batch 2 (where table speed has no main effect) than in Batch 1, making it the second-ranked effect after down feed rate.

Batch 1 — Ranked Effects

DOE mean plots and interaction effects analysis for Batch 1 yield the following ranked effect estimates:

RankEffectEstimate
1Table speed X1X_130.77-30.77
2X1×X3X_1 \times X_3 interaction20.25-20.25
3X1×X2X_1 \times X_2 interaction+9.70+9.70
4Wheel grit X3X_37.18-7.18
5Down feed X2X_20\approx 0
6X2×X3X_2 \times X_3 interaction0\approx 0

For Batch 1, table speed (X1X_1) is the dominant factor with an effect of 30.77-30.77 units, meaning slower table speed produces stronger ceramics. The X1×X3X_1 \times X_3 interaction is also substantial at 20.25-20.25 units. The standard deviation plot shows table speed also has a significant variability effect of approximately 20 units between levels.

Batch 2 — Ranked Effects

RankEffectEstimate
1Down feed X2X_2+18.22+18.22
2X1×X3X_1 \times X_3 interaction16.71-16.71
3Wheel grit X3X_314.71-14.71
4Table speed X1X_10\approx 0
5X1×X2X_1 \times X_2 interaction0\approx 0
6X2×X3X_2 \times X_3 interaction0\approx 0

For Batch 2, the ranking is markedly different: down feed rate (X2X_2) is the dominant factor at +18.22+18.22 units, while table speed (X1X_1) has essentially no effect. The X1×X3X_1 \times X_3 interaction remains important in both batches. The standard deviation differences are roughly comparable across all three factors for Batch 2.

Batch Comparison of Factor Effects

The most important finding from the factor effects analysis is that the factor rankings are not consistent across batches:

  • In Batch 1, table speed (X1X_1) dominates with a 30.77-30.77 effect; down feed (X2X_2) is negligible
  • In Batch 2, down feed (X2X_2) dominates with a +18.22+18.22 effect; table speed (X1X_1) is negligible

This batch-by-factor interaction makes it impossible to give a single set of optimal machining parameters that apply to both batches. The batch effect of approximately 75 units remains the dominant primary factor in the overall analysis.

Quantitative Output and Interpretation

Summary Statistics

StatisticValue
Sample size nn480
Mean Yˉ\bar{Y}650.0773
Std Dev ss74.6383
Median646.6275
Minimum345.2940
Maximum821.6540
Range476.3600

The mean and median are reasonably close (Yˉ=650.08\bar{Y} = 650.08 vs. median = 646.63), suggesting approximate symmetry in the pooled data. The large standard deviation of s=74.64s = 74.64 reflects both within-treatment variability and the substantial batch effect. The minimum of 345.29 is far below the bulk of the data (most points fall between 500 and 750), flagging potential outlier specimens.

Distribution Test

The Anderson-Darling test on the pooled data rejects normality, but this reflects the mixture of batch effects rather than true non-normality within treatment groups. The bimodal structure visible in the histogram is an artifact of pooling two populations with different means.

The normal probability plot of pooled data shows curvature — specifically an S-shape consistent with a bimodal mixture distribution. Within individual treatment groups (controlling for batch, lab, and factor levels), the data are more consistent with normality.

The skewness and kurtosis of the pooled data also reflect the bimodal structure: the distribution has lighter tails than a normal distribution but with a flattened central region due to the two batch sub-populations.

Outlier Detection

Grubbs’ test on within-group residuals identifies a few potential outliers at the low end — specimens with strength values in the 300 to 450 range, well below the bulk of the data (500 to 750). Specifically:

  • Approximately half a dozen low-lying points appear across the full dataset
  • Both batches contain low-side outliers; Batch 2 also shows some high-side outliers
  • Labs 3 and 5 show the most pronounced low-side outliers

In ceramic strength testing, such low outliers often represent specimens with unusually large surface flaws or micro-cracks introduced during machining. These specimens may warrant separate investigation for quality control purposes, as they reflect real (if rare) failure modes rather than measurement errors.

Interpretation

The 4-plot screening of the pooled 480 observations immediately reveals a bimodal distribution as the dominant graphical finding — two distinct peaks in the histogram rather than a single symmetric distribution. The run sequence plot and lag plot show no concerning temporal patterns, confirming that the bimodality is structural rather than time-dependent. The bihistogram identifies the source: Batch 1 is centered at approximately 689 and Batch 2 at approximately 611, a separation confirmed by the two-sample t-test (T=13.38T = 13.38, p0.001p \ll 0.001). This batch effect of approximately 75-100 units is consistent across all 8 labs and all 16 treatment combinations, as confirmed by block plots showing parallel batch lines across labs and lab-specific box plots showing the batch separation within each laboratory.

The lab effect analysis confirms that lab-to-lab variation is minor (F=1.837<Fcrit=2.082F = 1.837 < F_{crit} = 2.082), justifying treatment of the labs as homogeneous for primary factor analysis. With lab effects ruled out, the DOE mean plots reveal the critical finding of this case study: the factor rankings are not consistent across batches. In Batch 1, table speed (X1X_1) dominates with an effect of 30.77-30.77 units while down feed rate (X2X_2) is negligible. In Batch 2, the pattern reverses — down feed rate dominates at +18.22+18.22 units while table speed has essentially no effect. The X1×X3X_1 \times X_3 (table speed by wheel grit) interaction is substantial in both batches (20.25-20.25 in Batch 1, 16.71-16.71 in Batch 2), and the interaction plots confirm the non-parallel pattern indicating that table speed and wheel grit effects are interdependent.

The inconsistency in factor rankings across batches has direct engineering implications: no single set of optimal machining parameters applies to both batches. The batch effect was completely unexpected by the NIST investigators, demonstrating that unmeasured material differences between batches can dominate machining factor effects. This case study illustrates how graphical EDA techniques efficiently reveal multi-factor structure — the bihistogram exposes batch separation, block plots confirm consistency across labs, and DOE mean plots rank factor importance within each batch. A multi-factor ANOVA with batch as a blocking factor is the recommended follow-up analysis.

Conclusions

The ceramic strength data present a multi-faceted analysis challenge. The key findings, in order of importance, are:

  1. Dominant batch effect — The bihistogram and block plots show Batch 1 specimens are on average 77.8477.84 units stronger than Batch 2 (T=13.38T = 13.38, p0.001p \ll 0.001), a difference that is consistent across all 8 labs and all 16 treatment combinations. This effect was completely unexpected by the scientific investigators.

  2. Equal batch variances — Despite the large location difference, the batch variances are not significantly different (F=1.123F = 1.123, within the acceptance region), indicating the batch effect is primarily a shift in location.

  3. Inconsistent factor rankings — The DOE mean plots show that primary factor effects differ between batches: table speed (X1X_1) dominates in Batch 1 (30.77-30.77), while down feed rate (X2X_2) dominates in Batch 2 (+18.22+18.22). The interaction plots confirm a substantial X1×X3X_1 \times X_3 interaction in both batches. This batch-by-factor interaction complicates the determination of optimal machining parameters.

  4. Negligible lab effect — The lab effect is small relative to the batch effect, and the one-way ANOVA confirms the 8 labs are homogeneous (F=1.837<Fcrit=2.082F = 1.837 < F_{crit} = 2.082).

  5. Low-side outliers — A few specimens in the 300 to 450 range represent unusually weak specimens, likely due to surface flaws, and warrant quality control investigation.

The recommended analysis approach is a multi-factor ANOVA that includes batch as a blocking factor, followed by examination of main effects and interactions among the primary machining factors — analyzed separately by batch given the inconsistent factor rankings. This case study demonstrates how graphical techniques (bihistograms, block plots, box plots by group) efficiently reveal batch and lab effects that would be difficult to detect from summary statistics alone, and how DOE mean plots, standard deviation plots, and interaction plots identify the key machining factors and their interdependencies within each batch.