Skip to main content

Standard Resistor Case Study

NIST/SEMATECH Section 1.4.2.7 Standard Resistor

Background and Data

This case study applies exploratory data analysis to 1000 standard resistor measurements collected by Ron Dziuba at NIST over a five-year period (1980—1985). The response variable is resistance in ohms. The primary purpose is to demonstrate how EDA techniques detect drift in location, non-constant variation, and non-randomness caused by seasonal humidity effects on measurement equipment.

The dataset originates from NIST/SEMATECH Section 1.4.2.7. With n=1000n = 1000 observations ranging from approximately 27.828 to 28.119, this study illustrates a case where three of the four standard assumptions are violated simultaneously --- making it one of the most severely non-conforming datasets in the case study collection.

Dataset

DZIUBA1.DAT
Observations: 1,000
Variable: Resistance (ohms)

Ron Dziuba, NIST, standard resistor measurements (1980-1985)

NIST source description
Standard Resistor case study. Ron Dziuba, NIST. Response variable = resistance (ohms). Measurements taken over a 5-year period (1980-1985) to study long-term drift and seasonal effects on a standard resistor. Number of observations = 1000.
Preview data
# Value
1 27.868
2 27.8929
3 27.8773
4 27.853
5 27.8876
6 27.8725
7 27.8743
8 27.8879
9 27.8728
10 27.8746
... 990 more rows

Test Underlying Assumptions

Goals

The analysis has three primary objectives:

  1. Model validation --- assess whether the univariate model is an appropriate fit for the standard resistor data:
Yi=C+EiY_i = C + E_i
  1. Assumption testing --- evaluate whether the data satisfy the four standard assumptions for a measurement process in statistical control:

    • Random sampling --- the data are uncorrelated
    • Fixed distribution --- the data come from a fixed distribution
    • Fixed location --- the distribution location (mean) is constant
    • Fixed variation --- the distribution scale (standard deviation) is constant
  2. Confidence interval validity --- determine whether the standard confidence interval formula is appropriate:

Yˉ±2sN\bar{Y} \pm \frac{2s}{\sqrt{N}}

where ss is the standard deviation. This formula relies on all four assumptions holding; if they are violated, the confidence interval has no statistical meaning.

If the assumptions are violated, identify the nature and severity of the violations and recommend appropriate remedial actions.

Graphical Output and Interpretation

4-Plot Overview

The 4-plot is the primary graphical tool for testing all four assumptions simultaneously.

Four-plot diagnostic layout for the standard resistor dataset. Run sequence shows upward drift over 1000 observations, lag plot shows extreme linear correlation.

The assumptions are addressed by the four diagnostic plots:

  1. The run sequence plot (upper left) shows a persistent upward drift over the 1000-observation period, with superimposed seasonal fluctuations. The fixed-location assumption is clearly violated.
  2. The lag plot (upper right) displays an extremely tight linear cluster along the diagonal, indicating that consecutive observations are almost identical --- the randomness assumption is severely violated (r1=0.97r_1 = 0.97).
  3. Since the lag plot indicates the most extreme autocorrelation in the case study collection, the histogram (lower left) and normal probability plot (lower right) are not meaningful for distributional interpretation.

The severe violation of the randomness assumption means that the univariate model Yi=C+EiY_i = C + E_i is not valid. The long-term drift and seasonal periodicity indicate that time-series or seasonal adjustment models would be needed if continued monitoring were required.

Run Sequence Plot

The run sequence plot shows 1000 observations with a persistent upward drift over the five-year measurement period (1980—1985). Seasonal variation is visible as periodic fluctuations superimposed on the trend. The drift is consistent with physical aging of the resistor, while the periodic fluctuations correspond to seasonal humidity effects on the measurement equipment.

Conclusion: The persistent upward drift and seasonal fluctuations demonstrate that the fixed-location assumption is violated.

Run sequence plot of 1000 resistor measurements showing persistent upward drift over the 5-year measurement period (1980-1985), with seasonal variation visible as periodic fluctuations.

Lag Plot

The lag plot at lag 1 displays an extremely tight linear cluster along the diagonal, indicating that consecutive observations are nearly identical. The lag-1 autocorrelation is r1=0.97r_1 = 0.97, more extreme than the Filter Transmittance case study (r1=0.94r_1 = 0.94) and the most extreme autocorrelation in the entire case study collection.

Conclusion: The tight linear cluster along y=xy = x indicates each measurement is almost completely determined by the previous measurement. The randomness assumption is severely violated.

Lag-1 plot showing a tight linear cluster along the diagonal (r₁ = 0.97), indicating extreme autocorrelation from slowly drifting measurements.

Histogram

The histogram shows the distribution of the 1000 resistance values. However, because the randomness assumption is severely violated (lag-1 autocorrelation r1=0.97r_1 = 0.97), the histogram does not represent the distribution of independent errors. Instead, it reflects the combined effects of the long-term drift and seasonal fluctuation visible in the run sequence plot.

Histogram with KDE overlay. Because the data are severely autocorrelated, the shape reflects drift structure rather than a meaningful distribution.

The histogram shape is determined by the drift structure and seasonal pattern rather than the underlying error distribution alone. Any distributional assessment is deferred until the randomness issue is resolved.

Normal Probability Plot

The normal probability plot assesses whether the data follow a normal distribution. As with the histogram, interpretation is limited because the severe autocorrelation (r1=0.97r_1 = 0.97) violates the independence assumption required for distributional tests.

Normal probability plot. Interpretation is not meaningful because the randomness assumption is violated.

The probability plot cannot be reliably interpreted as evidence for or against normality when the observations are not independent. The distributional assessment is deferred until the randomness issue is resolved.

Autocorrelation Plot

The autocorrelation plot quantifies the serial dependence detected by the lag plot. With n=1000n = 1000, the 95% confidence bands are at ±2/1000=±0.063\pm 2/\sqrt{1000} = \pm 0.063.

Autocorrelation plot showing extreme positive autocorrelation persisting to high lags — consistent with long-term drift and seasonal effects in the measurement process.

The autocorrelation plot shows extreme positive autocorrelation persisting to very high lags. This is consistent with the long-term drift and seasonal patterns visible in the run sequence plot. The slow decay of the autocorrelation function reflects the gradual upward trend, while any periodic structure corresponds to the seasonal humidity effects.

Spectral Plot

The spectral plot shows the frequency-domain structure of the data.

Spectral plot showing dominant low-frequency content, consistent with the long-term drift and seasonal variation visible in the run sequence plot.

The spectral plot shows dominant low-frequency content from the long-term drift. Periodic components from the seasonal humidity effects may also be visible as peaks at frequencies corresponding to the yearly cycle. The overwhelming low-frequency dominance is characteristic of a process with a strong trend component.

Quantitative Output and Interpretation

Summary Statistics

StatisticValue
Sample size nn1000
Mean Yˉ\bar{Y}28.01634
Median28.02910
Min27.82800
Max28.11850
Range0.29050
Std Dev ss0.06349
Autocorrelation r1r_10.97

The mean and median differ by about 0.013, with the median slightly higher, reflecting the asymmetry introduced by the upward drift (later observations are higher). The standard deviation of 0.06349 reflects the combined effects of measurement variation, drift, and seasonal fluctuation. The standard confidence interval is suspect:

Yˉ±2sN=28.01634±0.00401\bar{Y} \pm \frac{2s}{\sqrt{N}} = 28.01634 \pm 0.00401

This interval dramatically understates the true uncertainty because it assumes independent observations. With autocorrelation of 0.97, the effective sample size is far smaller than 1000, and any confidence interval computed from the standard formula has no statistical justification.

Location Test

The location test fits a linear regression of the response YY against the run-order index X=1,2,,NX = 1, 2, \ldots, N and tests whether the slope is significantly different from zero.

H0 ⁣:B1=0vs.Ha ⁣:B10H_0\!: B_1 = 0 \quad \text{vs.} \quad H_a\!: B_1 \neq 0

The slope estimate is numerically tiny (approximately 0.00021 ohms per observation), but the t-value is enormous because the residual variation relative to the trend is very small.

Conclusion: The slope t-value of t=100.2t = 100.2 far exceeds the critical value t0.975,9981.962t_{0.975,\,998} \approx 1.962, so we reject H0H_0 --- the location is not constant. The data exhibit a significant upward drift over the five-year measurement period, consistent with physical aging of the standard resistor.

Variation Test

The Levene test (median-based variant) divides the data into k=4k = 4 equal-length intervals and tests whether their variances are homogeneous.

H0 ⁣:σ12=σ22=σ32=σ42vs.Ha ⁣:at least one σi2 differsH_0\!: \sigma_1^2 = \sigma_2^2 = \sigma_3^2 = \sigma_4^2 \quad \text{vs.} \quad H_a\!: \text{at least one } \sigma_i^2 \text{ differs}
StatisticValue
Test statistic WW140.85
Degrees of freedomk1=3k - 1 = 3 and Nk=996N - k = 996
Critical value F0.05,3,996F_{0.05,\,3,\,996}2.614

Conclusion: The test statistic of W=140.85W = 140.85 vastly exceeds the critical value of 2.614, so we reject H0H_0 --- the variation is not constant. The seasonal humidity pattern causes different measurement variability in different portions of the five-year period, with some seasons producing tighter measurements than others.

Randomness Tests

Two complementary tests assess whether the observations are independent.

Runs test --- tests whether the sequence of values above and below the median was produced randomly.

H0 ⁣:sequence is randomvs.Ha ⁣:sequence is not randomH_0\!: \text{sequence is random} \quad \text{vs.} \quad H_a\!: \text{sequence is not random}
StatisticValue
Test statistic ZZ30.5629-30.5629
Critical value Z1α/2Z_{1-\alpha/2}1.96

Conclusion: Z=30.5629|Z| = 30.5629 far exceeds 1.96, so we reject H0H_0 --- the data are not random. The negative Z indicates far fewer runs than expected, meaning the data cluster in extremely long sequences above or below the median. This is the most extreme runs test result in the case study collection.

Lag-1 autocorrelation --- measures the linear dependence between consecutive observations.

StatisticValue
r1r_10.97
Critical value 2/N2/\sqrt{N}0.063

Conclusion: The lag-1 autocorrelation of 0.97 vastly exceeds the critical value of 0.063. Each measurement explains approximately 94% of the variance of the next (r120.94r_1^2 \approx 0.94). This is the most extreme autocorrelation in the case study collection, indicating that consecutive measurements are almost identical due to the slowly varying drift and seasonal effects.

Distribution and Outlier Tests

Since the randomness assumption is rejected (r1=0.97r_1 = 0.97), the distributional tests are not meaningful and are omitted. When data are severely autocorrelated, the histogram and normal probability plot reflect the dependence structure rather than the underlying distribution of independent errors. The Grubbs’ test for outliers is also omitted because it assumes approximately normally distributed, independent data.

Test Summary

AssumptionTestStatisticCritical ValueResult
Fixed locationRegression on run ordert=100.2t = 100.21.962Reject
Fixed variationLevene testW=140.85W = 140.852.614Reject
RandomnessRuns testZ=30.5629Z = {-30.5629}1.96Reject
RandomnessAutocorrelation lag-1r1=0.97r_1 = 0.970.063Reject
Distribution---------Not meaningful
Outliers---------Not meaningful

Three of the four assumptions are violated: the location drifts upward (massive t=100.2t = 100.2), the variation changes across the measurement period (W=140.85W = 140.85), and the data are severely non-random (r1=0.97r_1 = 0.97, Z=30.5629Z = {-30.5629}). Only the distribution assumption is not directly testable because the randomness assumption must hold first. The univariate model Yi=C+EiY_i = C + E_i is not appropriate for this data.

Interpretation

Three of the four assumptions fail simultaneously, making the standard resistor dataset one of the most severely non-conforming in the case study collection. The location drift is massive (t=100.2t = 100.2, far beyond the critical value of 1.962), the variation changes significantly across the measurement period (Levene test W=140.85W = 140.85, critical value 2.614), and the randomness violation is the most extreme encountered (r1=0.97r_1 = 0.97, runs test Z=30.5629Z = {-30.5629}). This contrasts with the Filter Transmittance case study, where only two assumptions failed (location and randomness, with constant variation).

The graphical evidence tells a cohesive story. The run sequence plot reveals the dual structure: a persistent upward trend from physical aging of the resistor, overlaid with periodic fluctuations from seasonal humidity effects. The lag plot immediately exposes the extreme autocorrelation --- a tighter linear cluster than any other case study. The autocorrelation plot confirms that the serial dependence persists to very high lags, decaying slowly as expected for a trending process. The spectral plot shows dominant low-frequency content consistent with the long-term drift.

The severity of these violations has practical consequences. With r1=0.97r_1 = 0.97, each measurement explains approximately 94% of the variance of the next (r120.94r_1^2 \approx 0.94). The effective sample size is drastically smaller than 1000, rendering the standard confidence interval of 28.01634±0.0040128.01634 \pm 0.00401 completely invalid. No standard statistical inference --- confidence intervals, hypothesis tests, or prediction intervals --- can be trusted when the independence assumption is violated this severely.

The root cause is physical and environmental rather than statistical. The five-year measurement period (1980—1985) is long enough for both the physical aging of the resistor (upward drift in resistance) and the annual humidity cycle (seasonal fluctuations in measurement conditions) to manifest clearly. The role of the graphical and statistical analysis is to detect these problems; resolving them requires understanding of the measurement process itself.

Conclusions

The standard resistor data fail three of the four assumptions: location is not constant (upward drift with t=100.2t = 100.2), variation is not constant (seasonal effects with W=140.85W = 140.85), and the data are not random (extreme autocorrelation with r1=0.97r_1 = 0.97 and runs test Z=30.5629Z = {-30.5629}). Only the distribution assumption is not directly testable because it requires independence.

The standard confidence interval:

Yˉ±2sN=28.01634±0.00401\bar{Y} \pm \frac{2s}{\sqrt{N}} = 28.01634 \pm 0.00401

is not valid because the independence assumption is catastrophically violated.

The root cause is twofold: seasonal humidity effects on the measurement equipment caused non-constant variation and contributed to the extreme autocorrelation, while physical aging of the resistor caused the persistent upward drift in location. Measurements close in time were taken under similar environmental conditions, leading to the most extreme serial dependence in the case study collection.

Recommended actions include accounting for environmental factors (temperature, humidity) in the measurement process, applying time-series or seasonal adjustment models if continued monitoring is needed, and calibrating more frequently to track the physical drift. Simple graphical techniques --- particularly the lag plot, run sequence plot, and spectral plot --- efficiently detected all three violations. This case study demonstrates that severe assumption violations are not merely academic concerns; they invalidate the entire inferential framework and demand process-level investigation.