Standard Resistor Case Study
NIST/SEMATECH Section 1.4.2.7 Standard Resistor
Background and Data
This case study applies exploratory data analysis to 1000 standard resistor measurements collected by Ron Dziuba at NIST over a five-year period (1980—1985). The response variable is resistance in ohms. The primary purpose is to demonstrate how EDA techniques detect drift in location, non-constant variation, and non-randomness caused by seasonal humidity effects on measurement equipment.
The dataset originates from NIST/SEMATECH Section 1.4.2.7. With observations ranging from approximately 27.828 to 28.119, this study illustrates a case where three of the four standard assumptions are violated simultaneously --- making it one of the most severely non-conforming datasets in the case study collection.
Dataset
Ron Dziuba, NIST, standard resistor measurements (1980-1985)
NIST source description
Standard Resistor case study. Ron Dziuba, NIST. Response variable = resistance (ohms). Measurements taken over a 5-year period (1980-1985) to study long-term drift and seasonal effects on a standard resistor. Number of observations = 1000.
Preview data
| # | Value |
|---|---|
| 1 | 27.868 |
| 2 | 27.8929 |
| 3 | 27.8773 |
| 4 | 27.853 |
| 5 | 27.8876 |
| 6 | 27.8725 |
| 7 | 27.8743 |
| 8 | 27.8879 |
| 9 | 27.8728 |
| 10 | 27.8746 |
| ... 990 more rows | |
Test Underlying Assumptions
Goals
The analysis has three primary objectives:
- Model validation --- assess whether the univariate model is an appropriate fit for the standard resistor data:
-
Assumption testing --- evaluate whether the data satisfy the four standard assumptions for a measurement process in statistical control:
- Random sampling --- the data are uncorrelated
- Fixed distribution --- the data come from a fixed distribution
- Fixed location --- the distribution location (mean) is constant
- Fixed variation --- the distribution scale (standard deviation) is constant
-
Confidence interval validity --- determine whether the standard confidence interval formula is appropriate:
where is the standard deviation. This formula relies on all four assumptions holding; if they are violated, the confidence interval has no statistical meaning.
If the assumptions are violated, identify the nature and severity of the violations and recommend appropriate remedial actions.
Graphical Output and Interpretation
4-Plot Overview
The 4-plot is the primary graphical tool for testing all four assumptions simultaneously.
The assumptions are addressed by the four diagnostic plots:
- The run sequence plot (upper left) shows a persistent upward drift over the 1000-observation period, with superimposed seasonal fluctuations. The fixed-location assumption is clearly violated.
- The lag plot (upper right) displays an extremely tight linear cluster along the diagonal, indicating that consecutive observations are almost identical --- the randomness assumption is severely violated ().
- Since the lag plot indicates the most extreme autocorrelation in the case study collection, the histogram (lower left) and normal probability plot (lower right) are not meaningful for distributional interpretation.
The severe violation of the randomness assumption means that the univariate model is not valid. The long-term drift and seasonal periodicity indicate that time-series or seasonal adjustment models would be needed if continued monitoring were required.
Run Sequence Plot
The run sequence plot shows 1000 observations with a persistent upward drift over the five-year measurement period (1980—1985). Seasonal variation is visible as periodic fluctuations superimposed on the trend. The drift is consistent with physical aging of the resistor, while the periodic fluctuations correspond to seasonal humidity effects on the measurement equipment.
Conclusion: The persistent upward drift and seasonal fluctuations demonstrate that the fixed-location assumption is violated.
Lag Plot
The lag plot at lag 1 displays an extremely tight linear cluster along the diagonal, indicating that consecutive observations are nearly identical. The lag-1 autocorrelation is , more extreme than the Filter Transmittance case study () and the most extreme autocorrelation in the entire case study collection.
Conclusion: The tight linear cluster along indicates each measurement is almost completely determined by the previous measurement. The randomness assumption is severely violated.
Histogram
The histogram shows the distribution of the 1000 resistance values. However, because the randomness assumption is severely violated (lag-1 autocorrelation ), the histogram does not represent the distribution of independent errors. Instead, it reflects the combined effects of the long-term drift and seasonal fluctuation visible in the run sequence plot.
The histogram shape is determined by the drift structure and seasonal pattern rather than the underlying error distribution alone. Any distributional assessment is deferred until the randomness issue is resolved.
Normal Probability Plot
The normal probability plot assesses whether the data follow a normal distribution. As with the histogram, interpretation is limited because the severe autocorrelation () violates the independence assumption required for distributional tests.
The probability plot cannot be reliably interpreted as evidence for or against normality when the observations are not independent. The distributional assessment is deferred until the randomness issue is resolved.
Autocorrelation Plot
The autocorrelation plot quantifies the serial dependence detected by the lag plot. With , the 95% confidence bands are at .
The autocorrelation plot shows extreme positive autocorrelation persisting to very high lags. This is consistent with the long-term drift and seasonal patterns visible in the run sequence plot. The slow decay of the autocorrelation function reflects the gradual upward trend, while any periodic structure corresponds to the seasonal humidity effects.
Spectral Plot
The spectral plot shows the frequency-domain structure of the data.
The spectral plot shows dominant low-frequency content from the long-term drift. Periodic components from the seasonal humidity effects may also be visible as peaks at frequencies corresponding to the yearly cycle. The overwhelming low-frequency dominance is characteristic of a process with a strong trend component.
Quantitative Output and Interpretation
Summary Statistics
| Statistic | Value |
|---|---|
| Sample size | 1000 |
| Mean | 28.01634 |
| Median | 28.02910 |
| Min | 27.82800 |
| Max | 28.11850 |
| Range | 0.29050 |
| Std Dev | 0.06349 |
| Autocorrelation | 0.97 |
The mean and median differ by about 0.013, with the median slightly higher, reflecting the asymmetry introduced by the upward drift (later observations are higher). The standard deviation of 0.06349 reflects the combined effects of measurement variation, drift, and seasonal fluctuation. The standard confidence interval is suspect:
This interval dramatically understates the true uncertainty because it assumes independent observations. With autocorrelation of 0.97, the effective sample size is far smaller than 1000, and any confidence interval computed from the standard formula has no statistical justification.
Location Test
The location test fits a linear regression of the response against the run-order index and tests whether the slope is significantly different from zero.
The slope estimate is numerically tiny (approximately 0.00021 ohms per observation), but the t-value is enormous because the residual variation relative to the trend is very small.
Conclusion: The slope t-value of far exceeds the critical value , so we reject --- the location is not constant. The data exhibit a significant upward drift over the five-year measurement period, consistent with physical aging of the standard resistor.
Variation Test
The Levene test (median-based variant) divides the data into equal-length intervals and tests whether their variances are homogeneous.
| Statistic | Value |
|---|---|
| Test statistic | 140.85 |
| Degrees of freedom | and |
| Critical value | 2.614 |
Conclusion: The test statistic of vastly exceeds the critical value of 2.614, so we reject --- the variation is not constant. The seasonal humidity pattern causes different measurement variability in different portions of the five-year period, with some seasons producing tighter measurements than others.
Randomness Tests
Two complementary tests assess whether the observations are independent.
Runs test --- tests whether the sequence of values above and below the median was produced randomly.
| Statistic | Value |
|---|---|
| Test statistic | |
| Critical value | 1.96 |
Conclusion: far exceeds 1.96, so we reject --- the data are not random. The negative Z indicates far fewer runs than expected, meaning the data cluster in extremely long sequences above or below the median. This is the most extreme runs test result in the case study collection.
Lag-1 autocorrelation --- measures the linear dependence between consecutive observations.
| Statistic | Value |
|---|---|
| 0.97 | |
| Critical value | 0.063 |
Conclusion: The lag-1 autocorrelation of 0.97 vastly exceeds the critical value of 0.063. Each measurement explains approximately 94% of the variance of the next (). This is the most extreme autocorrelation in the case study collection, indicating that consecutive measurements are almost identical due to the slowly varying drift and seasonal effects.
Distribution and Outlier Tests
Since the randomness assumption is rejected (), the distributional tests are not meaningful and are omitted. When data are severely autocorrelated, the histogram and normal probability plot reflect the dependence structure rather than the underlying distribution of independent errors. The Grubbs’ test for outliers is also omitted because it assumes approximately normally distributed, independent data.
Test Summary
| Assumption | Test | Statistic | Critical Value | Result |
|---|---|---|---|---|
| Fixed location | Regression on run order | 1.962 | Reject | |
| Fixed variation | Levene test | 2.614 | Reject | |
| Randomness | Runs test | 1.96 | Reject | |
| Randomness | Autocorrelation lag-1 | 0.063 | Reject | |
| Distribution | --- | --- | --- | Not meaningful |
| Outliers | --- | --- | --- | Not meaningful |
Three of the four assumptions are violated: the location drifts upward (massive ), the variation changes across the measurement period (), and the data are severely non-random (, ). Only the distribution assumption is not directly testable because the randomness assumption must hold first. The univariate model is not appropriate for this data.
Interpretation
Three of the four assumptions fail simultaneously, making the standard resistor dataset one of the most severely non-conforming in the case study collection. The location drift is massive (, far beyond the critical value of 1.962), the variation changes significantly across the measurement period (Levene test , critical value 2.614), and the randomness violation is the most extreme encountered (, runs test ). This contrasts with the Filter Transmittance case study, where only two assumptions failed (location and randomness, with constant variation).
The graphical evidence tells a cohesive story. The run sequence plot reveals the dual structure: a persistent upward trend from physical aging of the resistor, overlaid with periodic fluctuations from seasonal humidity effects. The lag plot immediately exposes the extreme autocorrelation --- a tighter linear cluster than any other case study. The autocorrelation plot confirms that the serial dependence persists to very high lags, decaying slowly as expected for a trending process. The spectral plot shows dominant low-frequency content consistent with the long-term drift.
The severity of these violations has practical consequences. With , each measurement explains approximately 94% of the variance of the next (). The effective sample size is drastically smaller than 1000, rendering the standard confidence interval of completely invalid. No standard statistical inference --- confidence intervals, hypothesis tests, or prediction intervals --- can be trusted when the independence assumption is violated this severely.
The root cause is physical and environmental rather than statistical. The five-year measurement period (1980—1985) is long enough for both the physical aging of the resistor (upward drift in resistance) and the annual humidity cycle (seasonal fluctuations in measurement conditions) to manifest clearly. The role of the graphical and statistical analysis is to detect these problems; resolving them requires understanding of the measurement process itself.
Conclusions
The standard resistor data fail three of the four assumptions: location is not constant (upward drift with ), variation is not constant (seasonal effects with ), and the data are not random (extreme autocorrelation with and runs test ). Only the distribution assumption is not directly testable because it requires independence.
The standard confidence interval:
is not valid because the independence assumption is catastrophically violated.
The root cause is twofold: seasonal humidity effects on the measurement equipment caused non-constant variation and contributed to the extreme autocorrelation, while physical aging of the resistor caused the persistent upward drift in location. Measurements close in time were taken under similar environmental conditions, leading to the most extreme serial dependence in the case study collection.
Recommended actions include accounting for environmental factors (temperature, humidity) in the measurement process, applying time-series or seasonal adjustment models if continued monitoring is needed, and calibrating more frequently to track the physical drift. Simple graphical techniques --- particularly the lag plot, run sequence plot, and spectral plot --- efficiently detected all three violations. This case study demonstrates that severe assumption violations are not merely academic concerns; they invalidate the entire inferential framework and demand process-level investigation.