Ceramic Strength Case Study
NIST/SEMATECH Section 1.4.2.10 Ceramic Strength
Background and Data
This case study applies exploratory data analysis to the NIST JAHANMI2.DAT dataset, which contains flexural strength measurements of bonded silicon nitride ceramic specimens. The data come from a designed experiment investigating the effects of three primary machining factors (table speed, down feed rate, and wheel grit size) and two nuisance factors (lab and batch) on ceramic strength. The data were collected by Said Jahanmir of the NIST Ceramics Division in 1996.
The full dataset contains 960 observations across 8 labs, 2 batches, 2 replications, and 16 treatment combinations. For this case study, only the longitudinal direction data are used, resulting in observations with three primary factors (table speed, down feed rate, wheel grit size) each at 2 levels. The dataset originates from NIST/SEMATECH Section 1.4.2.10.
Study Design
This is a multi-factor designed experiment with the following structure:
- Response variable — flexural strength of bonded silicon nitride (ceramic)
- Primary factors — table speed (), down feed rate (), wheel grit size (), each at 2 levels
- Nuisance factors — lab (8 levels) and batch (2 levels)
- Replications — 2 per treatment combination
The goals of the analysis are to:
- Determine which of the primary factors has the strongest effect on ceramic strength
- Estimate the magnitude of the effects
- Determine the optimal settings for the primary factors
- Determine if the nuisance factors (lab and batch) have an effect on ceramic strength
The general ANOVA model for this designed experiment includes main effects, interactions, and blocking factors:
where is the grand mean, is the batch effect, is the lab effect, represents the primary factor main effects, represents the interaction effects, and is the random error.
Dataset
Said Jahanmir, NIST Ceramics Division, ceramic flexural strength (1996)
NIST source description
Effect of Machining Factors on Strength of Ceramics (longitudinal data only). Response variable = ceramic strength (MPa). 15 variables per observation: Observation ID, Lab (8 levels), Bar ID, Set, Strength (Y), Table Speed (2 levels), Down Feed Rate (2 levels), Wheel Grit (2 levels), Direction, Treatment (16 levels), Set of 15, Rep (2 levels), Coded Lab, Bar Batch (2 levels), Distinct set of 15 reps. Number of observations = 480.
Preview data
| ID | Lab | Strength (MPa) | Table Speed | Down Feed | Wheel Grit | Batch | Rep |
|---|---|---|---|---|---|---|---|
| 1 | 1 | 608.781 | -1 | -1 | -1 | 1 | 1 |
| 2 | 1 | 569.67 | -1 | -1 | -1 | 2 | 1 |
| 3 | 1 | 689.556 | -1 | -1 | -1 | 1 | 1 |
| 4 | 1 | 747.541 | -1 | -1 | -1 | 2 | 1 |
| 5 | 1 | 618.134 | -1 | -1 | -1 | 1 | 1 |
| 6 | 1 | 612.182 | -1 | -1 | -1 | 2 | 1 |
| 7 | 1 | 680.203 | -1 | -1 | -1 | 1 | 1 |
| 8 | 1 | 607.766 | -1 | -1 | -1 | 2 | 1 |
| 9 | 1 | 726.232 | -1 | -1 | -1 | 1 | 1 |
| 10 | 1 | 605.38 | -1 | -1 | -1 | 2 | 1 |
| ... 470 more rows | |||||||
Response Variable Analysis
4-Plot Overview
The 4-plot of the response variable reveals important features of the pooled data:
- Run sequence plot (upper left) — the location and scale are relatively constant, though about half a dozen points in the 300 to 450 range may require attention as potential outliers. Most points cluster between 500 and 750. The run sequence plot can reveal time effects, which are typically undesirable nuisance factors in designed experiments.
- Lag plot (upper right) — does not show significant structure, indicating no strong temporal dependence in the run order
- Histogram (lower left) — appears reasonably symmetric but with a bimodal distribution, showing two distinct peaks rather than a single centered peak
- Normal probability plot (lower right) — shows some curvature indicating that distributions other than the normal may provide a better fit
The bimodal histogram is the most important finding from the 4-plot — it immediately suggests that a major grouping factor (batch, lab, or treatment) is creating two distinct sub-populations.
The assumptions are addressed by the four diagnostic plots:
- The run sequence plot (upper left) shows location and scale are relatively constant, though about half a dozen points in the 300-450 range may require attention as potential outliers.
- The lag plot (upper right) does not show significant temporal structure, indicating no strong time-based dependence in the run order.
- The histogram (lower left) reveals a bimodal distribution — two distinct peaks rather than a single centered peak. This is the most important finding and immediately suggests a major grouping factor (batch, lab, or treatment) is creating two distinct sub-populations.
- The normal probability plot (lower right) shows curvature indicating that the pooled data are not well-described by a single normal distribution — consistent with the bimodal structure in the histogram.
The bimodal histogram is the dominant finding. Before proceeding with standard assumption tests, the source of the bimodality must be identified and addressed.
Run Sequence Plot
The run sequence plot of the pooled response shows location and scale are relatively constant over time, though the spread of the data reflects both within-treatment variability and the substantial batch effect.
Lag Plot
The lag plot does not show significant temporal structure, indicating no strong time-based dependence in the measurement order.
Histogram
The histogram of the pooled response reveals the bimodal distribution — two distinct peaks corresponding to the two batches. This is the most important finding from the initial graphical analysis.
Normal Probability Plot
The normal probability plot shows curvature — specifically an S-shape consistent with a bimodal mixture distribution from pooling two batch sub-populations with different means.
Batch Effect Analysis
The batch effect is the dominant finding in this dataset. This section examines the batch difference using multiple graphical and statistical approaches.
Bihistogram
The bihistogram compares the distributions of Batch 1 and Batch 2 on a shared x-axis. Batch 1 responses are centered at approximately 689, while Batch 2 responses are centered at approximately 611, a difference of approximately 78 units. This batch effect was completely unexpected by the scientific investigators. The variability is comparable for both batches, though Batch 1 exhibits lower-tail skewness while Batch 2 shows central skewness. Both batches contain low-lying outlier points.
Batch Box Plot
Box plots by batch confirm the location difference and show multiple outliers on the low side for both batches, with Batch 2 also showing high-side outliers.
Batch Q-Q Plot
The quantile-quantile plot compares Batch 1 quantiles directly against Batch 2 quantiles. Except for a few points in the right tail, Batch 1 values have consistently higher quantiles than Batch 2, confirming the location difference. The Q-Q plot is not linear — this implies that the difference between the batches is not explained simply by a shift in location. The variation and skewness differ between batches as well, consistent with the shape differences visible in the bihistogram.
Batch Block Plots
Block plots showing batch means by lab confirm that the batch effect is consistent across all 8 laboratories. In every lab, Batch 1 means exceed Batch 2 means. The parallel nature of the two lines indicates that the batch effect is additive — approximately the same magnitude regardless of which lab performed the measurements.
The additional block plots below confirm that the batch effect holds across all combinations of labs and primary factor levels. In every case, Batch 1 exceeds Batch 2 — the batch effect is robust over table speed, down feed rate, and wheel grit size.
This consistency across all labs and all primary factor combinations strengthens the conclusion that the batch difference reflects a genuine material property difference rather than a lab-specific or factor-dependent artifact.
Batch Statistical Tests
The batch comparison is the central quantitative finding of this case study.
| Statistic | Batch 1 | Batch 2 |
|---|---|---|
| 240 | 240 | |
| Mean | 688.9987 | 611.1559 |
| Std Dev | 65.5491 | 61.8543 |
| Variance | 4296.6845 | 3825.9544 |
F-Test for Equal Variances
Before comparing means, we first test whether the two batches have equal variances using the F-test.
The F-test statistic is:
| Parameter | Value |
|---|---|
| Test statistic | 1.123 |
| Numerator df | 239 |
| Denominator df | 239 |
| Significance level | 0.05 |
| Acceptance region | (0.845, 1.289) |
Conclusion: falls within the acceptance region , so we fail to reject — the batch variances are not significantly different at the 5% level. This justifies using a pooled variance in the subsequent t-test.
Two-Sample t-Test for Equal Means
With equal variances established, the two-sample t-test tests whether the batch means are significantly different.
The pooled standard deviation and test statistic are:
| Parameter | Value |
|---|---|
| Test statistic | 13.3806 |
| Pooled std dev | 63.7285 |
| Degrees of freedom | 478 |
| Critical value | 1.965 |
Conclusion: vastly exceeds the critical value of 1.965, so we reject — the batch means are highly significantly different. Batch 1 is on average units stronger than Batch 2, and this difference is consistent across all labs and all primary factor levels.
Lab Effect Analysis
This section examines whether the 8 laboratories that performed the measurements have systematically different results.
Lab Box Plot
Box plots by lab show minor variation in medians across the 8 labs, with relatively constant scales. Two labs (3 and 5) show outliers on the low side. The overall pattern suggests that lab-to-lab differences are small compared to the batch effect.
Lab Box Plot by Batch
When analyzed separately by batch, the lab-to-lab variation becomes clearer:
- Batch 1 — median strength ranges from 650 to 700 across labs; variability is relatively constant; all labs contain at least one low-side outlier
- Batch 2 — median strength ranges from 550 to 600 across labs; there is somewhat more variability across labs compared to Batch 1; six labs show high-side outliers and three show low-side outliers
The batch effect of approximately 75 to 100 units on location dominates any lab effects within either batch.
Lab Statistical Tests
To formally test whether the labs differ, we perform a one-way ANOVA comparing the 8 lab means.
| Source | SS | df | MS | F |
|---|---|---|---|---|
| Between labs | 70,754.64 | 7 | 10,107.81 | 1.837 |
| Within labs | 2,597,691.93 | 472 | 5,503.58 | |
| Total | 2,668,446.57 | 479 |
| Parameter | Value |
|---|---|
| Test statistic | 1.837 |
| Between-groups df | 7 |
| Within-groups df | 472 |
| Critical value | 2.082 |
| Significance level | 0.05 |
Conclusion: is less than the critical value , so we fail to reject — the lab means are not significantly different at the 5% level. This confirms that the 8 laboratories are homogeneous and that any apparent lab-to-lab differences in the box plots are within the range of normal sampling variation. The labs can be treated as equivalent for purposes of analyzing the primary machining factors.
Primary Factors Analysis
The designed experiment analysis examines the effects of three primary machining factors, analyzed separately by batch because of the dominant batch effect:
| Factor | Symbol | Level | Level |
|---|---|---|---|
| Table speed | 0.025 | 0.125 | |
| Down feed rate | 0.050 | 0.125 | |
| Wheel grit size | 150 | 80 |
Because the batch effect dominates, factor effects are analyzed separately by batch. DOE mean plots, standard deviation plots, and interaction plots for each batch will be presented in the following subsections.
DOE Mean Plot
DOE mean plots show the mean response at each factor level. The steeper the line connecting the two levels, the larger the effect of that factor on ceramic strength. The dashed horizontal reference line marks the batch grand mean.
Batch 1
For Batch 1, table speed () produces the steepest slope, confirming its dominant effect of units. The negative sign indicates that lower table speed (level ) yields higher strength. Wheel grit () shows a moderate effect of units, while down feed rate () has essentially no effect (near-horizontal line).
Batch 2
For Batch 2, the pattern is markedly different. Down feed rate () now produces the steepest slope with an effect of units — the positive sign indicates that higher down feed rate yields higher strength. Wheel grit () is the second most important factor at units. Table speed () has essentially no effect (near-horizontal line), in stark contrast to Batch 1.
DOE Standard Deviation Plot
DOE standard deviation plots show the within-level standard deviation at each factor level. Large differences between levels indicate that the factor affects not only the mean response but also its variability.
Batch 1
For Batch 1, table speed () shows a substantial difference in variability between levels — the standard deviation at the high level is approximately 20 units larger than at the low level. This variability effect is important for process optimization because it means slower table speed not only produces stronger ceramics but also more consistent ones.
Batch 2
For Batch 2, the standard deviation differences across factor levels are roughly comparable for all three factors, with no single factor dominating the variability structure. This contrasts with Batch 1 where table speed had a pronounced variability effect.
Interaction Effects
Interaction plots show the mean response at each level of one factor, with separate lines for each level of a second factor. Non-parallel lines indicate an interaction — the effect of one factor depends on the level of the other.
The (table speed by wheel grit) interaction is the most important interaction in both batches, ranked second in the effect tables for both Batch 1 () and Batch 2 ().
Batch 1: Table Speed x Wheel Grit
The non-parallel lines confirm the interaction in Batch 1. At low table speed, the wheel grit effect is modest, but at high table speed, the wheel grit effect becomes much larger. The interaction magnitude of units is second only to the table speed main effect.
Batch 2: Table Speed x Wheel Grit
The interaction in Batch 2 has a similar pattern but with different magnitudes. This interaction effect of units is more important in Batch 2 (where table speed has no main effect) than in Batch 1, making it the second-ranked effect after down feed rate.
Batch 1 — Ranked Effects
DOE mean plots and interaction effects analysis for Batch 1 yield the following ranked effect estimates:
| Rank | Effect | Estimate |
|---|---|---|
| 1 | Table speed | |
| 2 | interaction | |
| 3 | interaction | |
| 4 | Wheel grit | |
| 5 | Down feed | |
| 6 | interaction |
For Batch 1, table speed () is the dominant factor with an effect of units, meaning slower table speed produces stronger ceramics. The interaction is also substantial at units. The standard deviation plot shows table speed also has a significant variability effect of approximately 20 units between levels.
Batch 2 — Ranked Effects
| Rank | Effect | Estimate |
|---|---|---|
| 1 | Down feed | |
| 2 | interaction | |
| 3 | Wheel grit | |
| 4 | Table speed | |
| 5 | interaction | |
| 6 | interaction |
For Batch 2, the ranking is markedly different: down feed rate () is the dominant factor at units, while table speed () has essentially no effect. The interaction remains important in both batches. The standard deviation differences are roughly comparable across all three factors for Batch 2.
Batch Comparison of Factor Effects
The most important finding from the factor effects analysis is that the factor rankings are not consistent across batches:
- In Batch 1, table speed () dominates with a effect; down feed () is negligible
- In Batch 2, down feed () dominates with a effect; table speed () is negligible
This batch-by-factor interaction makes it impossible to give a single set of optimal machining parameters that apply to both batches. The batch effect of approximately 75 units remains the dominant primary factor in the overall analysis.
Quantitative Output and Interpretation
Summary Statistics
| Statistic | Value |
|---|---|
| Sample size | 480 |
| Mean | 650.0773 |
| Std Dev | 74.6383 |
| Median | 646.6275 |
| Minimum | 345.2940 |
| Maximum | 821.6540 |
| Range | 476.3600 |
The mean and median are reasonably close ( vs. median = 646.63), suggesting approximate symmetry in the pooled data. The large standard deviation of reflects both within-treatment variability and the substantial batch effect. The minimum of 345.29 is far below the bulk of the data (most points fall between 500 and 750), flagging potential outlier specimens.
Distribution Test
The Anderson-Darling test on the pooled data rejects normality, but this reflects the mixture of batch effects rather than true non-normality within treatment groups. The bimodal structure visible in the histogram is an artifact of pooling two populations with different means.
The normal probability plot of pooled data shows curvature — specifically an S-shape consistent with a bimodal mixture distribution. Within individual treatment groups (controlling for batch, lab, and factor levels), the data are more consistent with normality.
The skewness and kurtosis of the pooled data also reflect the bimodal structure: the distribution has lighter tails than a normal distribution but with a flattened central region due to the two batch sub-populations.
Outlier Detection
Grubbs’ test on within-group residuals identifies a few potential outliers at the low end — specimens with strength values in the 300 to 450 range, well below the bulk of the data (500 to 750). Specifically:
- Approximately half a dozen low-lying points appear across the full dataset
- Both batches contain low-side outliers; Batch 2 also shows some high-side outliers
- Labs 3 and 5 show the most pronounced low-side outliers
In ceramic strength testing, such low outliers often represent specimens with unusually large surface flaws or micro-cracks introduced during machining. These specimens may warrant separate investigation for quality control purposes, as they reflect real (if rare) failure modes rather than measurement errors.
Interpretation
The 4-plot screening of the pooled 480 observations immediately reveals a bimodal distribution as the dominant graphical finding — two distinct peaks in the histogram rather than a single symmetric distribution. The run sequence plot and lag plot show no concerning temporal patterns, confirming that the bimodality is structural rather than time-dependent. The bihistogram identifies the source: Batch 1 is centered at approximately 689 and Batch 2 at approximately 611, a separation confirmed by the two-sample t-test (, ). This batch effect of approximately 75-100 units is consistent across all 8 labs and all 16 treatment combinations, as confirmed by block plots showing parallel batch lines across labs and lab-specific box plots showing the batch separation within each laboratory.
The lab effect analysis confirms that lab-to-lab variation is minor (), justifying treatment of the labs as homogeneous for primary factor analysis. With lab effects ruled out, the DOE mean plots reveal the critical finding of this case study: the factor rankings are not consistent across batches. In Batch 1, table speed () dominates with an effect of units while down feed rate () is negligible. In Batch 2, the pattern reverses — down feed rate dominates at units while table speed has essentially no effect. The (table speed by wheel grit) interaction is substantial in both batches ( in Batch 1, in Batch 2), and the interaction plots confirm the non-parallel pattern indicating that table speed and wheel grit effects are interdependent.
The inconsistency in factor rankings across batches has direct engineering implications: no single set of optimal machining parameters applies to both batches. The batch effect was completely unexpected by the NIST investigators, demonstrating that unmeasured material differences between batches can dominate machining factor effects. This case study illustrates how graphical EDA techniques efficiently reveal multi-factor structure — the bihistogram exposes batch separation, block plots confirm consistency across labs, and DOE mean plots rank factor importance within each batch. A multi-factor ANOVA with batch as a blocking factor is the recommended follow-up analysis.
Conclusions
The ceramic strength data present a multi-faceted analysis challenge. The key findings, in order of importance, are:
-
Dominant batch effect — The bihistogram and block plots show Batch 1 specimens are on average units stronger than Batch 2 (, ), a difference that is consistent across all 8 labs and all 16 treatment combinations. This effect was completely unexpected by the scientific investigators.
-
Equal batch variances — Despite the large location difference, the batch variances are not significantly different (, within the acceptance region), indicating the batch effect is primarily a shift in location.
-
Inconsistent factor rankings — The DOE mean plots show that primary factor effects differ between batches: table speed () dominates in Batch 1 (), while down feed rate () dominates in Batch 2 (). The interaction plots confirm a substantial interaction in both batches. This batch-by-factor interaction complicates the determination of optimal machining parameters.
-
Negligible lab effect — The lab effect is small relative to the batch effect, and the one-way ANOVA confirms the 8 labs are homogeneous ().
-
Low-side outliers — A few specimens in the 300 to 450 range represent unusually weak specimens, likely due to surface flaws, and warrant quality control investigation.
The recommended analysis approach is a multi-factor ANOVA that includes batch as a blocking factor, followed by examination of main effects and interactions among the primary machining factors — analyzed separately by batch given the inconsistent factor rankings. This case study demonstrates how graphical techniques (bihistograms, block plots, box plots by group) efficiently reveal batch and lab effects that would be difficult to detect from summary statistics alone, and how DOE mean plots, standard deviation plots, and interaction plots identify the key machining factors and their interdependencies within each batch.