Ceramic Strength Case Study

NIST/SEMATECH Section 1.4.2.10 Ceramic Strength

Background and Data

This case study applies exploratory data analysis to the NIST JAHANMI2.DAT dataset, which contains flexural strength measurements of bonded silicon nitride ceramic specimens. The data come from a designed experiment investigating the effects of three primary machining factors (table speed, down feed rate, and wheel grit size) and two nuisance factors (lab and batch) on ceramic strength. The data were collected by Said Jahanmir of the NIST Ceramics Division in 1996.

The full dataset contains 960 observations across 8 labs, 2 batches, 2 replications, and 16 treatment combinations. For this case study, only the longitudinal direction data are used, resulting in $n = 480$ observations with three primary factors (table speed, down feed rate, wheel grit size) each at 2 levels. The dataset originates from NIST/SEMATECH Section 1.4.2.10.

Study Design

This is a multi-factor designed experiment with the following structure:

Response variable $Y$ — flexural strength of bonded silicon nitride (ceramic)
Primary factors — table speed ( $X_1$ ), down feed rate ( $X_2$ ), wheel grit size ( $X_3$ ), each at 2 levels
Nuisance factors — lab (8 levels) and batch (2 levels)
Replications — 2 per treatment combination

The goals of the analysis are to:

Determine which of the primary factors has the strongest effect on ceramic strength
Estimate the magnitude of the effects
Determine the optimal settings for the primary factors
Determine if the nuisance factors (lab and batch) have an effect on ceramic strength

The general ANOVA model for this designed experiment includes main effects, interactions, and blocking factors:

Y_{ijklm} = \mu + B_i + L_j + \tau_k + (\tau\tau)_{kl} + \varepsilon_{ijklm}

where $\mu$ is the grand mean, $B_i$ is the batch effect, $L_j$ is the lab effect, $\tau_k$ represents the primary factor main effects, $(\tau\tau)_{kl}$ represents the interaction effects, and $\varepsilon_{ijklm}$ is the random error.

Dataset

JAHANMI2.DAT

Observations: 480

Variables: 8 variables

Said Jahanmir, NIST Ceramics Division, ceramic flexural strength (1996)

NIST source description

Effect of Machining Factors on Strength of Ceramics (longitudinal data only). Response variable = ceramic strength (MPa). 15 variables per observation: Observation ID, Lab (8 levels), Bar ID, Set, Strength (Y), Table Speed (2 levels), Down Feed Rate (2 levels), Wheel Grit (2 levels), Direction, Treatment (16 levels), Set of 15, Rep (2 levels), Coded Lab, Bar Batch (2 levels), Distinct set of 15 reps. Number of observations = 480.

Preview data

ID	Lab	Strength (MPa)	Table Speed	Down Feed	Wheel Grit	Batch	Rep
1	1	608.781	-1	-1	-1	1	1
2	1	569.67	-1	-1	-1	2	1
3	1	689.556	-1	-1	-1	1	1
4	1	747.541	-1	-1	-1	2	1
5	1	618.134	-1	-1	-1	1	1
6	1	612.182	-1	-1	-1	2	1
7	1	680.203	-1	-1	-1	1	1
8	1	607.766	-1	-1	-1	2	1
9	1	726.232	-1	-1	-1	1	1
10	1	605.38	-1	-1	-1	2	1
... 470 more rows

Download CSV NIST Source

Response Variable Analysis

4-Plot Overview

The 4-plot of the response variable reveals important features of the pooled data:

Run sequence plot (upper left) — the location and scale are relatively constant, though about half a dozen points in the 300 to 450 range may require attention as potential outliers. Most points cluster between 500 and 750. The run sequence plot can reveal time effects, which are typically undesirable nuisance factors in designed experiments.
Lag plot (upper right) — does not show significant structure, indicating no strong temporal dependence in the run order
Histogram (lower left) — appears reasonably symmetric but with a bimodal distribution, showing two distinct peaks rather than a single centered peak
Normal probability plot (lower right) — shows some curvature indicating that distributions other than the normal may provide a better fit

The bimodal histogram is the most important finding from the 4-plot — it immediately suggests that a major grouping factor (batch, lab, or treatment) is creating two distinct sub-populations.

Four-plot diagnostic layout for pooled ceramic strength data (run sequence, lag, histogram, normal probability).

The assumptions are addressed by the four diagnostic plots:

The run sequence plot (upper left) shows location and scale are relatively constant, though about half a dozen points in the 300-450 range may require attention as potential outliers.
The lag plot (upper right) does not show significant temporal structure, indicating no strong time-based dependence in the run order.
The histogram (lower left) reveals a bimodal distribution — two distinct peaks rather than a single centered peak. This is the most important finding and immediately suggests a major grouping factor (batch, lab, or treatment) is creating two distinct sub-populations.
The normal probability plot (lower right) shows curvature indicating that the pooled data are not well-described by a single normal distribution — consistent with the bimodal structure in the histogram.

The bimodal histogram is the dominant finding. Before proceeding with standard assumption tests, the source of the bimodality must be identified and addressed.

Run Sequence Plot

The run sequence plot of the pooled response shows location and scale are relatively constant over time, though the spread of the data reflects both within-treatment variability and the substantial batch effect.

Run sequence plot of pooled flexural strength showing relatively constant location and scale across 480 observations.

Lag Plot

The lag plot does not show significant temporal structure, indicating no strong time-based dependence in the measurement order.

Lag-1 plot of pooled ceramic strength data showing no significant temporal dependence in measurement order.

Histogram

The histogram of the pooled response reveals the bimodal distribution — two distinct peaks corresponding to the two batches. This is the most important finding from the initial graphical analysis.

Histogram with KDE overlay of pooled ceramic strength data revealing a bimodal distribution from the batch effect.

Normal Probability Plot

The normal probability plot shows curvature — specifically an S-shape consistent with a bimodal mixture distribution from pooling two batch sub-populations with different means.

Normal probability plot of pooled ceramic strength data. The S-shaped curvature reflects the bimodal mixture of two batch sub-populations.

Batch Effect Analysis

The batch effect is the dominant finding in this dataset. This section examines the batch difference using multiple graphical and statistical approaches.

Bihistogram

The bihistogram compares the distributions of Batch 1 and Batch 2 on a shared x-axis. Batch 1 responses are centered at approximately 689, while Batch 2 responses are centered at approximately 611, a difference of approximately 78 units. This batch effect was completely unexpected by the scientific investigators. The variability is comparable for both batches, though Batch 1 exhibits lower-tail skewness while Batch 2 shows central skewness. Both batches contain low-lying outlier points.

Bihistogram comparing Batch 1 (top) and Batch 2 (bottom) flexural strength distributions on a shared x-axis. The clear separation confirms the approximately 78-unit batch effect.

Batch Box Plot

Box plots by batch confirm the location difference and show multiple outliers on the low side for both batches, with Batch 2 also showing high-side outliers.

Box plot comparing Batch 1 and Batch 2 flexural strength. Batch 1 is centered approximately 78 units higher, with both batches showing low-side outliers.

Batch Q-Q Plot

The quantile-quantile plot compares Batch 1 quantiles directly against Batch 2 quantiles. Except for a few points in the right tail, Batch 1 values have consistently higher quantiles than Batch 2, confirming the location difference. The Q-Q plot is not linear — this implies that the difference between the batches is not explained simply by a shift in location. The variation and skewness differ between batches as well, consistent with the shape differences visible in the bihistogram.

Quantile-quantile plot comparing Batch 1 and Batch 2 quantiles. Batch 1 quantiles consistently exceed Batch 2, and the non-linear pattern indicates the batch difference involves more than a simple location shift.

Batch Block Plots

Block plots showing batch means by lab confirm that the batch effect is consistent across all 8 laboratories. In every lab, Batch 1 means exceed Batch 2 means. The parallel nature of the two lines indicates that the batch effect is additive — approximately the same magnitude regardless of which lab performed the measurements.

Block plot of batch means by lab showing Batch 1 consistently exceeds Batch 2 across all 8 labs, confirming the batch effect is not lab-specific.

The additional block plots below confirm that the batch effect holds across all combinations of labs and primary factor levels. In every case, Batch 1 exceeds Batch 2 — the batch effect is robust over table speed, down feed rate, and wheel grit size.

Block plot of batch means by lab and table speed level. Batch 1 exceeds Batch 2 in all 16 combinations, confirming the batch effect is robust over table speed.

Block plot of batch means by lab and down feed level. Batch 1 exceeds Batch 2 in all 16 combinations, confirming the batch effect is robust over down feed rate.

Block plot of batch means by lab and wheel grit level. Batch 1 exceeds Batch 2 in all 16 combinations, confirming the batch effect is robust over wheel grit size.

This consistency across all labs and all primary factor combinations strengthens the conclusion that the batch difference reflects a genuine material property difference rather than a lab-specific or factor-dependent artifact.

Batch Statistical Tests

The batch comparison is the central quantitative finding of this case study.

Statistic	Batch 1	Batch 2
$n$	240	240
Mean $\bar{Y}$	688.9987	611.1559
Std Dev $s$	65.5491	61.8543
Variance $s^2$	4296.6845	3825.9544

F-Test for Equal Variances

Before comparing means, we first test whether the two batches have equal variances using the F-test.

H_0\!: \sigma_1^2 = \sigma_2^2 \quad \text{vs.} \quad H_a\!: \sigma_1^2 \neq \sigma_2^2

The F-test statistic is:

F = \frac{s_1^2}{s_2^2} = \frac{4296.6845}{3825.9544} = 1.123

Parameter	Value
Test statistic $F$	1.123
Numerator df $\nu_1$	239
Denominator df $\nu_2$	239
Significance level $\alpha$	0.05
Acceptance region	(0.845, 1.289)

Conclusion: $F = 1.123$ falls within the acceptance region $(0.845,\; 1.289)$ , so we fail to reject $H_0$ — the batch variances are not significantly different at the 5% level. This justifies using a pooled variance in the subsequent t-test.

Two-Sample t-Test for Equal Means

With equal variances established, the two-sample t-test tests whether the batch means are significantly different.

H_0\!: \mu_1 = \mu_2 \quad \text{vs.} \quad H_a\!: \mu_1 \neq \mu_2

The pooled standard deviation and test statistic are:

s_p = 63.7285, \qquad T = \frac{\bar{Y}_1 - \bar{Y}_2}{s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} = 13.3806

Parameter	Value
Test statistic $T$	13.3806
Pooled std dev $s_p$	63.7285
Degrees of freedom $\nu$	478
Critical value $t_{0.975,\,478}$	1.965

Conclusion: $T = 13.38$ vastly exceeds the critical value of 1.965, so we reject $H_0$ — the batch means are highly significantly different. Batch 1 is on average $689.00 - 611.16 = 77.84$ units stronger than Batch 2, and this difference is consistent across all labs and all primary factor levels.

Lab Effect Analysis

This section examines whether the 8 laboratories that performed the measurements have systematically different results.

Lab Box Plot

Box plots by lab show minor variation in medians across the 8 labs, with relatively constant scales. Two labs (3 and 5) show outliers on the low side. The overall pattern suggests that lab-to-lab differences are small compared to the batch effect.

Box plots of flexural strength by lab (pooled data). Medians vary slightly across labs with relatively constant scales. Labs 3 and 5 show low-side outliers.

Lab Box Plot by Batch

When analyzed separately by batch, the lab-to-lab variation becomes clearer:

Batch 1 — median strength ranges from 650 to 700 across labs; variability is relatively constant; all labs contain at least one low-side outlier
Batch 2 — median strength ranges from 550 to 600 across labs; there is somewhat more variability across labs compared to Batch 1; six labs show high-side outliers and three show low-side outliers

Box plots by lab for Batch 1. Median strength ranges from 650 to 700 across labs with relatively constant variability.

Box plots by lab for Batch 2. Median strength ranges from 550 to 600 with somewhat more cross-lab variability than Batch 1.

The batch effect of approximately 75 to 100 units on location dominates any lab effects within either batch.

Lab Statistical Tests

To formally test whether the labs differ, we perform a one-way ANOVA comparing the 8 lab means.

H_0\!: \mu_1 = \mu_2 = \cdots = \mu_8 \quad \text{vs.} \quad H_a\!: \text{at least one } \mu_i \text{ differs}

Source	SS	df	MS	F
Between labs	70,754.64	7	10,107.81	1.837
Within labs	2,597,691.93	472	5,503.58
Total	2,668,446.57	479

Parameter	Value
Test statistic $F$	1.837
Between-groups df $\nu_1$	7
Within-groups df $\nu_2$	472
Critical value $F_{0.95,\,7,\,472}$	2.082
Significance level $\alpha$	0.05

Conclusion: $F = 1.837$ is less than the critical value $F_{0.95,\,7,\,472} = 2.082$ , so we fail to reject $H_0$ — the lab means are not significantly different at the 5% level. This confirms that the 8 laboratories are homogeneous and that any apparent lab-to-lab differences in the box plots are within the range of normal sampling variation. The labs can be treated as equivalent for purposes of analyzing the primary machining factors.

Primary Factors Analysis

The designed experiment analysis examines the effects of three primary machining factors, analyzed separately by batch because of the dominant batch effect:

Factor	Symbol	Level $-1$	Level $+1$
Table speed	$X_1$	0.025	0.125
Down feed rate	$X_2$	0.050	0.125
Wheel grit size	$X_3$	150	80

Because the batch effect dominates, factor effects are analyzed separately by batch. DOE mean plots, standard deviation plots, and interaction plots for each batch will be presented in the following subsections.

DOE Mean Plot

DOE mean plots show the mean response at each factor level. The steeper the line connecting the two levels, the larger the effect of that factor on ceramic strength. The dashed horizontal reference line marks the batch grand mean.

Batch 1

DOE mean plot for Batch 1 showing factor level means. Table speed (X1) has the steepest slope, indicating the dominant effect.

For Batch 1, table speed ( $X_1$ ) produces the steepest slope, confirming its dominant effect of $-30.77$ units. The negative sign indicates that lower table speed (level $-1$ ) yields higher strength. Wheel grit ( $X_3$ ) shows a moderate effect of $-7.18$ units, while down feed rate ( $X_2$ ) has essentially no effect (near-horizontal line).

Batch 2

DOE mean plot for Batch 2 showing factor level means. Down feed (X2) has the steepest slope, indicating the dominant effect.

For Batch 2, the pattern is markedly different. Down feed rate ( $X_2$ ) now produces the steepest slope with an effect of $+18.22$ units — the positive sign indicates that higher down feed rate yields higher strength. Wheel grit ( $X_3$ ) is the second most important factor at $-14.71$ units. Table speed ( $X_1$ ) has essentially no effect (near-horizontal line), in stark contrast to Batch 1.

DOE Standard Deviation Plot

DOE standard deviation plots show the within-level standard deviation at each factor level. Large differences between levels indicate that the factor affects not only the mean response but also its variability.

Batch 1

DOE standard deviation plot for Batch 1. Table speed shows the largest variability effect between factor levels.

For Batch 1, table speed ( $X_1$ ) shows a substantial difference in variability between levels — the standard deviation at the high level is approximately 20 units larger than at the low level. This variability effect is important for process optimization because it means slower table speed not only produces stronger ceramics but also more consistent ones.

Batch 2

DOE standard deviation plot for Batch 2. Variability differences are roughly comparable across all three factors.

For Batch 2, the standard deviation differences across factor levels are roughly comparable for all three factors, with no single factor dominating the variability structure. This contrasts with Batch 1 where table speed had a pronounced variability effect.

Interaction Effects

Interaction plots show the mean response at each level of one factor, with separate lines for each level of a second factor. Non-parallel lines indicate an interaction — the effect of one factor depends on the level of the other.

The $X_1 \times X_3$ (table speed by wheel grit) interaction is the most important interaction in both batches, ranked second in the effect tables for both Batch 1 ( $-20.25$ ) and Batch 2 ( $-16.71$ ).

Batch 1: Table Speed x Wheel Grit

Interaction plot for X1 (table speed) by X3 (wheel grit) in Batch 1. Non-parallel lines indicate a substantial interaction effect of -20.25 units.

The non-parallel lines confirm the $X_1 \times X_3$ interaction in Batch 1. At low table speed, the wheel grit effect is modest, but at high table speed, the wheel grit effect becomes much larger. The interaction magnitude of $-20.25$ units is second only to the table speed main effect.

Batch 2: Table Speed x Wheel Grit

Interaction plot for X1 (table speed) by X3 (wheel grit) in Batch 2. Non-parallel lines indicate an interaction effect of -16.71 units.

The $X_1 \times X_3$ interaction in Batch 2 has a similar pattern but with different magnitudes. This interaction effect of $-16.71$ units is more important in Batch 2 (where table speed has no main effect) than in Batch 1, making it the second-ranked effect after down feed rate.

Batch 1 — Ranked Effects

DOE mean plots and interaction effects analysis for Batch 1 yield the following ranked effect estimates:

Rank	Effect	Estimate
1	Table speed $X_1$	$-30.77$
2	$X_1 \times X_3$ interaction	$-20.25$
3	$X_1 \times X_2$ interaction	$+9.70$
4	Wheel grit $X_3$	$-7.18$
5	Down feed $X_2$	$\approx 0$
6	$X_2 \times X_3$ interaction	$\approx 0$

For Batch 1, table speed ( $X_1$ ) is the dominant factor with an effect of $-30.77$ units, meaning slower table speed produces stronger ceramics. The $X_1 \times X_3$ interaction is also substantial at $-20.25$ units. The standard deviation plot shows table speed also has a significant variability effect of approximately 20 units between levels.

Batch 2 — Ranked Effects

Rank	Effect	Estimate
1	Down feed $X_2$	$+18.22$
2	$X_1 \times X_3$ interaction	$-16.71$
3	Wheel grit $X_3$	$-14.71$
4	Table speed $X_1$	$\approx 0$
5	$X_1 \times X_2$ interaction	$\approx 0$
6	$X_2 \times X_3$ interaction	$\approx 0$

For Batch 2, the ranking is markedly different: down feed rate ( $X_2$ ) is the dominant factor at $+18.22$ units, while table speed ( $X_1$ ) has essentially no effect. The $X_1 \times X_3$ interaction remains important in both batches. The standard deviation differences are roughly comparable across all three factors for Batch 2.

Batch Comparison of Factor Effects

The most important finding from the factor effects analysis is that the factor rankings are not consistent across batches:

In Batch 1, table speed ( $X_1$ ) dominates with a $-30.77$ effect; down feed ( $X_2$ ) is negligible
In Batch 2, down feed ( $X_2$ ) dominates with a $+18.22$ effect; table speed ( $X_1$ ) is negligible

This batch-by-factor interaction makes it impossible to give a single set of optimal machining parameters that apply to both batches. The batch effect of approximately 75 units remains the dominant primary factor in the overall analysis.

Quantitative Output and Interpretation

Summary Statistics

Statistic	Value
Sample size $n$	480
Mean $\bar{Y}$	650.0773
Std Dev $s$	74.6383
Median	646.6275
Minimum	345.2940
Maximum	821.6540
Range	476.3600

The mean and median are reasonably close ( $\bar{Y} = 650.08$ vs. median = 646.63), suggesting approximate symmetry in the pooled data. The large standard deviation of $s = 74.64$ reflects both within-treatment variability and the substantial batch effect. The minimum of 345.29 is far below the bulk of the data (most points fall between 500 and 750), flagging potential outlier specimens.

Distribution Test

The Anderson-Darling test on the pooled data rejects normality, but this reflects the mixture of batch effects rather than true non-normality within treatment groups. The bimodal structure visible in the histogram is an artifact of pooling two populations with different means.

The normal probability plot of pooled data shows curvature — specifically an S-shape consistent with a bimodal mixture distribution. Within individual treatment groups (controlling for batch, lab, and factor levels), the data are more consistent with normality.

The skewness and kurtosis of the pooled data also reflect the bimodal structure: the distribution has lighter tails than a normal distribution but with a flattened central region due to the two batch sub-populations.

Outlier Detection

Grubbs’ test on within-group residuals identifies a few potential outliers at the low end — specimens with strength values in the 300 to 450 range, well below the bulk of the data (500 to 750). Specifically:

Approximately half a dozen low-lying points appear across the full dataset
Both batches contain low-side outliers; Batch 2 also shows some high-side outliers
Labs 3 and 5 show the most pronounced low-side outliers

In ceramic strength testing, such low outliers often represent specimens with unusually large surface flaws or micro-cracks introduced during machining. These specimens may warrant separate investigation for quality control purposes, as they reflect real (if rare) failure modes rather than measurement errors.

Interpretation

The 4-plot screening of the pooled 480 observations immediately reveals a bimodal distribution as the dominant graphical finding — two distinct peaks in the histogram rather than a single symmetric distribution. The run sequence plot and lag plot show no concerning temporal patterns, confirming that the bimodality is structural rather than time-dependent. The bihistogram identifies the source: Batch 1 is centered at approximately 689 and Batch 2 at approximately 611, a separation confirmed by the two-sample t-test ( $T = 13.38$ , $p \ll 0.001$ ). This batch effect of approximately 75-100 units is consistent across all 8 labs and all 16 treatment combinations, as confirmed by block plots showing parallel batch lines across labs and lab-specific box plots showing the batch separation within each laboratory.

The lab effect analysis confirms that lab-to-lab variation is minor ( $F = 1.837 < F_{crit} = 2.082$ ), justifying treatment of the labs as homogeneous for primary factor analysis. With lab effects ruled out, the DOE mean plots reveal the critical finding of this case study: the factor rankings are not consistent across batches. In Batch 1, table speed ( $X_1$ ) dominates with an effect of $-30.77$ units while down feed rate ( $X_2$ ) is negligible. In Batch 2, the pattern reverses — down feed rate dominates at $+18.22$ units while table speed has essentially no effect. The $X_1 \times X_3$ (table speed by wheel grit) interaction is substantial in both batches ( $-20.25$ in Batch 1, $-16.71$ in Batch 2), and the interaction plots confirm the non-parallel pattern indicating that table speed and wheel grit effects are interdependent.

The inconsistency in factor rankings across batches has direct engineering implications: no single set of optimal machining parameters applies to both batches. The batch effect was completely unexpected by the NIST investigators, demonstrating that unmeasured material differences between batches can dominate machining factor effects. This case study illustrates how graphical EDA techniques efficiently reveal multi-factor structure — the bihistogram exposes batch separation, block plots confirm consistency across labs, and DOE mean plots rank factor importance within each batch. A multi-factor ANOVA with batch as a blocking factor is the recommended follow-up analysis.

Conclusions

The ceramic strength data present a multi-faceted analysis challenge. The key findings, in order of importance, are:

Dominant batch effect — The bihistogram and block plots show Batch 1 specimens are on average $77.84$ units stronger than Batch 2 ( $T = 13.38$ , $p \ll 0.001$ ), a difference that is consistent across all 8 labs and all 16 treatment combinations. This effect was completely unexpected by the scientific investigators.
Equal batch variances — Despite the large location difference, the batch variances are not significantly different ( $F = 1.123$ , within the acceptance region), indicating the batch effect is primarily a shift in location.
Inconsistent factor rankings — The DOE mean plots show that primary factor effects differ between batches: table speed ( $X_1$ ) dominates in Batch 1 ( $-30.77$ ), while down feed rate ( $X_2$ ) dominates in Batch 2 ( $+18.22$ ). The interaction plots confirm a substantial $X_1 \times X_3$ interaction in both batches. This batch-by-factor interaction complicates the determination of optimal machining parameters.
Negligible lab effect — The lab effect is small relative to the batch effect, and the one-way ANOVA confirms the 8 labs are homogeneous ( $F = 1.837 < F_{crit} = 2.082$ ).
Low-side outliers — A few specimens in the 300 to 450 range represent unusually weak specimens, likely due to surface flaws, and warrant quality control investigation.

The recommended analysis approach is a multi-factor ANOVA that includes batch as a blocking factor, followed by examination of main effects and interactions among the primary machining factors — analyzed separately by batch given the inconsistent factor rankings. This case study demonstrates how graphical techniques (bihistograms, block plots, box plots by group) efficiently reveal batch and lab effects that would be difficult to detect from summary statistics alone, and how DOE mean plots, standard deviation plots, and interaction plots identify the key machining factors and their interdependencies within each batch.