Heat Flow Meter 1 Case Study

NIST/SEMATECH Section 1.4.2.8 Heat Flow Meter 1

Background and Data

This case study applies exploratory data analysis to the NIST ZARR13.DAT dataset, which contains 195 calibration factor measurements from a heat flow meter calibration and stability analysis. The data were collected by Bob Zarr of NIST in January 1990. The response variable is a calibration factor, and the motivation for studying this dataset is to illustrate a well-behaved process where the underlying assumptions hold and the process is in statistical control.

The dataset originates from NIST/SEMATECH Section 1.4.2.8. With $n = 195$ observations, this study demonstrates how the standard EDA methodology confirms that a univariate measurement process meets the assumptions required for valid statistical inference.

Dataset

ZARR13.DAT

Observations: 195

Variable: Calibration factor

Bob Zarr, NIST, heat flow meter calibration factor (Jan 1990)

NIST source description

Heat Flow Meter (HFM) Calibration & Stability Analysis (ASTM C-16). Bob Zarr, NIST. Coded file ID: set F29, scan rate = 1, date of test = 1/24/90. Response variable = computed calibration factor. Number of observations = 195.

Preview data

#	Value
1	9.206343
2	9.299992
3	9.277895
4	9.305795
5	9.275351
6	9.288729
7	9.287239
8	9.260973
9	9.303111
10	9.275674
... 185 more rows

Download CSV NIST Source

Test Underlying Assumptions

Goals

The analysis has three primary objectives:

Model validation — assess whether the univariate model is an appropriate fit for the heat flow meter calibration data:

Y_i = C + E_i

Assumption testing — evaluate whether the data satisfy the four standard assumptions for a measurement process in statistical control:
- Random sampling — the data are uncorrelated
- Fixed distribution — the data come from a fixed distribution
- Fixed location — the distribution location (mean) is constant
- Fixed variation — the distribution scale (standard deviation) is constant
Confidence interval validity — determine whether the standard confidence interval formula is appropriate:

\bar{Y} \pm \frac{2s}{\sqrt{N}}

where $s$ is the standard deviation. This formula relies on all four assumptions holding; if they are violated, the confidence interval has no statistical meaning.

Graphical Output and Interpretation

4-Plot Overview

The 4-plot is the primary graphical tool for testing all four assumptions simultaneously. For the heat flow meter data, it reveals a well-behaved dataset. The run sequence plot indicates no significant shifts in location or scale over time. The lag plot does not indicate any non-random pattern. The histogram shows the data are reasonably symmetric and consistent with a normal distribution. The normal probability plot verifies that the normality assumption is reasonable.

Four-plot diagnostic layout for the heat flow meter calibration data (run sequence, lag, histogram, normal probability).

The assumptions are addressed by the four diagnostic plots:

The run sequence plot (upper left) shows 195 measurements fluctuating around a stable central value of approximately 9.261 with no systematic drift — the fixed-location and fixed-variation assumptions appear satisfied.
The lag plot (upper right) displays a roughly circular scatter cloud, consistent with approximately independent observations — no strong evidence of non-randomness.
The histogram (lower left) is approximately symmetric and bell-shaped, centered near 9.261 — consistent with a normal distribution.
The normal probability plot (lower right) shows data points closely following the theoretical straight line — confirming that the normality assumption is reasonable.

From the above plots, we conclude that the underlying assumptions are valid and the data follow approximately a normal distribution. The standard confidence interval is appropriate for quantifying the uncertainty of the calibration factor.

Run Sequence Plot

The run sequence plot shows 195 measurements fluctuating around a stable central value of approximately 9.261. No systematic drift or trend is visible. The data are consistent with a stable measurement process.

Run sequence plot of 195 heat flow meter calibration measurements fluctuating stably around 9.261 — a well-behaved measurement process.

Lag Plot

The lag plot at lag 1 displays a roughly circular scatter cloud, consistent with approximately independent observations. There is no strong linear or curvilinear pattern that would indicate severe autocorrelation.

Lag-1 plot showing a roughly circular scatter cloud, consistent with approximately independent observations.

Histogram

The histogram is approximately symmetric and bell-shaped, centered near 9.261. The shape is consistent with a normal distribution. An overlaid normal PDF with mean 9.261 and standard deviation 0.023 fits the data well.

Histogram with KDE overlay of the calibration factor data. The approximately bell-shaped distribution is consistent with normality.

Normal Probability Plot

The normal probability plot shows data points closely following the theoretical straight line (fitted intercept 9.261, slope 0.023). The linearity confirms that the normality assumption is reasonable for these data.

Normal probability plot of the heat flow meter data. The close adherence to the reference line confirms the normality assumption.

Autocorrelation Plot

The autocorrelation plot quantifies the serial dependence in the data. With 95% confidence bands at $\pm 2/\sqrt{195} = \pm 0.143$ , autocorrelation coefficients exceeding these bounds indicate significant non-randomness.

Autocorrelation plot showing mild autocorrelation — the lag-1 coefficient of 0.281 exceeds the 95% significance bounds, but autocorrelation decays quickly.

The lag-1 autocorrelation of 0.281 exceeds the significance bounds, confirming the mild non-randomness detected by the runs test. However, the autocorrelation decays quickly, and the departure from independence is not severe enough to warrant a more complex model.

Spectral Plot

The spectral plot shows the frequency-domain structure of the calibration data.

Spectral plot of the calibration data showing modest low-frequency content consistent with mild autocorrelation, but no sharp periodic peaks.

The spectrum shows modest low-frequency content consistent with the mild autocorrelation, but no sharp peaks that would indicate periodic structure. The process is essentially well-behaved with a minor departure from strict independence.

Quantitative Output and Interpretation

Summary Statistics

Statistic	Value
Sample size $n$	195
Mean $\bar{Y}$	9.261460
Std Dev $s$	0.022789
Median	9.261952
Min	9.196848
Max	9.327973
Range	0.131126

The mean and median are nearly identical (9.261460 vs. 9.261952), confirming symmetry. The standard deviation of 0.023 represents approximately 0.25% relative precision, typical for heat flow meter calibrations.

Location Test

The location test fits a linear regression of the response $Y$ against the run-order index $X = 1, 2, \ldots, N$ and tests whether the slope is significantly different from zero.

H_0\!: B_1 = 0 \quad \text{vs.} \quad H_a\!: B_1 \neq 0

Parameter	Estimate	Std Error	t-Value
$B_0$ (intercept)	9.26699	0.003253	2849
$B_1$ (slope)	−0.000056412	0.00002878	−1.960

Residual standard deviation: 0.022624 with 193 degrees of freedom.

Conclusion: The slope t-value of −1.960 is at the boundary of the critical value $t_{0.975,\,193} = 1.96$ . While technically borderline, the slope estimate of −0.000056 is so small that it can essentially be considered zero — the estimated drift over the entire 195-observation run is only $195 \times 0.000056 = 0.011$ , less than half a standard deviation. The fixed-location assumption is satisfied.

Variation Test

Bartlett’s test divides the data into $k = 4$ equal-length intervals and tests whether their variances are homogeneous.

H_0\!: \sigma_1^2 = \sigma_2^2 = \sigma_3^2 = \sigma_4^2 \quad \text{vs.} \quad H_a\!: \text{at least one } \sigma_i^2 \text{ differs}

Statistic	Value
Test statistic $T$	3.147
Degrees of freedom	$k - 1 = 3$
Critical value $\chi^2_{0.05,\,3}$	7.815
Significance level $\alpha$	0.05

Conclusion: The test statistic of 3.147 is well below the critical value of 7.815, so we do not reject $H_0$ — the variances are not significantly different across the four quarters of the dataset. The constant-variation assumption is satisfied.

Randomness Tests

Two complementary tests assess whether the observations are independent.

Runs test — tests whether the sequence of values above and below the median was produced randomly.

H_0\!: \text{sequence is random} \quad \text{vs.} \quad H_a\!: \text{sequence is not random}

Statistic	Value
Test statistic $Z$	−3.2306
Critical value $Z_{1-\alpha/2}$	1.96

Conclusion: $|Z| = 3.23$ exceeds 1.96, so we reject $H_0$ — the data show statistically significant non-randomness. The negative Z indicates fewer runs than expected, meaning the data tend to cluster in sequences above or below the median.

Lag-1 autocorrelation — measures the linear dependence between consecutive observations.

Statistic	Value
$r_1$	0.281
Critical value $1.96/\sqrt{N}$	0.140

Conclusion: The lag-1 autocorrelation of 0.281 exceeds the critical value of 0.140, confirming the non-randomness detected by the runs test. The randomness assumption is violated, but the violation is mild — the autocorrelation is modest compared to cases like a random walk where $r_1 \approx 0.99$ .

However, as the NIST handbook notes, the violation of the randomness assumption is mild and “not serious enough to warrant developing a more sophisticated model.” In practice, mild non-randomness is common in calibration data and requires a judgment call about whether the violation warrants a more complex model.

Distribution Test

The probability plot correlation coefficient is 0.999, exceeding the critical value of 0.987. The Anderson-Darling test provides a more sensitive formal test.

Statistic	Value
PPCC	0.999
PPCC critical value	0.987
Anderson-Darling $A^2$	0.129
Critical value ( $\alpha = 0.05$ )	0.787

Conclusion: Both the PPCC (0.999 > 0.987) and Anderson-Darling ( $A^2 = 0.129$ well below 0.787) support the normality assumption. The normal distribution is a good model for this data.

Outlier Detection

Grubbs’ test tests whether the most extreme observation is a statistical outlier.

Statistic	Value
$G$	2.918673
Critical value (upper one-tailed, $\alpha = 0.05$ )	3.597898

Conclusion: $G = 2.919$ is less than the critical value of 3.598, so we do not reject the null hypothesis — no outliers are detected.

Test Summary

Assumption	Test	Statistic	Critical Value	Result
Fixed location	Regression on run order	$t = {-1.960}$	1.96	Borderline — do not reject
Fixed variation	Bartlett’s test	$T = 3.147$	7.815	Do not reject
Randomness	Runs test	$Z = {-3.231}$	1.96	Reject
Randomness	Autocorrelation lag-1	$r_1 = 0.281$	0.140	Reject
Normality	Anderson-Darling	$A^2 = 0.129$	0.787	Do not reject
Outliers	Grubbs’ test	$G = 2.919$	3.598	Do not reject

The assumptions of fixed location, fixed variation, and normality are satisfied. The randomness assumption shows a mild violation (lag-1 autocorrelation of 0.281 is statistically significant), but the departure is not severe enough to require a more complex model.

Interpretation

The graphical and quantitative analyses are largely consistent: three of the four underlying assumptions are clearly satisfied, and the sole violation is mild. The run sequence plot shows a stable process with no visible trend, confirmed by the borderline location test ( $t = -1.960$ , with estimated drift of only 0.011 over the full run — less than half a standard deviation). Bartlett’s test confirms constant variation ( $T = 3.147$ , well below the critical value of 7.815). The normal probability plot is closely linear, and both the Anderson-Darling test ( $A^2 = 0.129$ , well below 0.787) and PPCC (0.999, well above 0.987) strongly support normality. Grubbs’ test ( $G = 2.919$ , below 3.598) confirms no outliers.

The only departure from the ideal is a mild violation of the randomness assumption. The runs test ( $Z = -3.231$ ) and lag-1 autocorrelation ( $r_1 = 0.281$ ) both reject independence at the 5% level. However, the autocorrelation plot shows that the dependence decays quickly beyond lag 1, and the spectral plot reveals no periodic structure — the modest low-frequency content is consistent with mild short-range correlation rather than a systematic pattern. As the NIST handbook notes, the departure “is not serious enough to warrant developing a more sophisticated model.”

The univariate model $Y_i = 9.26146 + E_i$ is appropriate for this dataset. The standard 95% confidence interval is approximately valid, though the true uncertainty is somewhat larger than $\bar{Y} \pm 2s/\sqrt{N}$ suggests due to the mild autocorrelation. The process is considered to be in statistical control — the calibration factor is stable, the variation is consistent, and the data are well-described by a normal distribution.

Conclusions

The heat flow meter calibration data satisfy three of the four assumptions. The location is stable, the variation is constant across quarters of the dataset, and the data follow a normal distribution with no outliers. The randomness assumption shows a mild violation — the lag-1 autocorrelation of 0.281 and runs test statistic of −3.23 both reject the null hypothesis of independence at the 5% level. However, this violation is not severe enough to invalidate the univariate model.

The recommended model is:

Y_i = 9.26146 + E_i

The 95% confidence interval for the calibration factor is:

\bar{Y} \pm \frac{2s}{\sqrt{N}} = 9.26146 \pm 2 \times \frac{0.02279}{\sqrt{195}} = (9.2582,\; 9.2647)

The standard deviation of the mean is $s/\sqrt{N} = 0.001632$ , and the 95% confidence interval for the standard deviation is (0.02073, 0.02531).

Because the randomness assumption has a mild violation, the true uncertainty is somewhat larger than the standard confidence interval suggests. Nevertheless, the NIST handbook considers this process to be in statistical control — the departure from randomness is not serious enough to warrant developing a more sophisticated model. This case study illustrates a well-behaved measurement process where the standard EDA methodology confirms that simple statistical inference based on the sample mean is appropriate.