Beam Deflections Case Study
NIST/SEMATECH Section 1.4.2.5 Beam Deflections
Background and Data
This data set was collected by H. S. Lew of NIST in 1969 to measure steel-concrete beam deflections. The response variable is the deflection of a beam from the center point. This case study applies exploratory data analysis to the NIST LEW.DAT dataset, which contains 200 univariate measurements of beam deflection. The primary purpose is to demonstrate how EDA techniques detect periodic (sinusoidal) structure in what might initially appear to be a random process.
The dataset originates from NIST/SEMATECH Section 1.4.2.5. With observations, this study illustrates the critical importance of testing the randomness assumption, as the data exhibit cyclic behavior that is invisible to simple location or spread summaries but readily detected by autocorrelation and spectral analysis.
Dataset
H. S. Lew, NBS, steel-concrete beam deflection (1969)
NIST source description
Steel-concrete beam deflection. H. S. Lew (NBS Center for Building Technology), 1969. Response variable = deflection of beam from center point. Number of observations = 200.
Preview data
| # | Value |
|---|---|
| 1 | -213 |
| 2 | -564 |
| 3 | -35 |
| 4 | -15 |
| 5 | 141 |
| 6 | 115 |
| 7 | -420 |
| 8 | -360 |
| 9 | 203 |
| 10 | -338 |
| ... 190 more rows | |
Test Underlying Assumptions
Goals
The analysis has three primary objectives:
- Model validation — assess whether the univariate model is an appropriate fit for the beam deflection data:
-
Assumption testing — evaluate whether the data satisfy the four standard assumptions for a measurement process in statistical control:
- Random sampling — the data are uncorrelated
- Fixed distribution — the data come from a fixed distribution
- Fixed location — the distribution location (mean) is constant
- Fixed variation — the distribution scale (standard deviation) is constant
-
Confidence interval validity — determine whether the standard confidence interval formula is appropriate:
where is the standard deviation. This formula relies on all four assumptions holding; if they are violated, the confidence interval has no statistical meaning.
If the assumptions are violated, identify the nature and severity of the violations and recommend appropriate remedial actions.
Graphical Output and Interpretation
4-Plot Overview
The 4-plot provides a mixed picture. The run sequence plot shows an oscillatory pattern around a stable mean, suggesting periodic structure. The lag plot shows an elliptical pattern indicating positive autocorrelation. The histogram is roughly symmetric and bell-shaped. The normal probability plot is approximately linear. The dominant finding is the failure of the randomness assumption due to cyclic behavior.
The assumptions are addressed by the four diagnostic plots:
- The run sequence plot (upper left) shows an oscillatory pattern around a stable mean of approximately −177, suggesting periodic structure rather than random scatter — the location is stable on average but the data are clearly not random.
- The lag plot (upper right) shows an elliptical structure oriented along the diagonal, indicating strong positive autocorrelation — the randomness assumption is seriously violated.
- When the randomness assumption is thus seriously violated, the histogram (lower left) and normal probability plot (lower right) are less meaningful since determining the distribution is only valid when the data are random. Nevertheless, both appear roughly consistent with normality.
From the above plots we conclude that the underlying randomness assumption is not valid. The model is not appropriate — a model that accounts for the periodic structure is needed.
Run Sequence Plot
The run sequence plot shows 200 observations fluctuating around a stable mean of approximately −177. Rather than random scatter, the data exhibit a clear oscillatory (sinusoidal) pattern. The location is stable on average, and the variation appears constant, but the sequential ordering reveals periodic structure that would be missed by looking at summary statistics alone.
Lag Plot
The lag plot at lag 1 displays an elliptical structure oriented along the diagonal, indicating strong positive autocorrelation. Consecutive observations are correlated — when one value is high, the next tends to be high as well. This elliptical pattern is the signature of a positively autocorrelated process and contrasts sharply with the circular scatter expected from independent data.
Histogram
The histogram is roughly symmetric and bell-shaped, centered near −177. The shape is consistent with an approximately normal distribution. The histogram alone gives no indication of the underlying periodic structure because it discards the sequential ordering of observations.
Normal Probability Plot
The normal probability plot is approximately linear, with data points following the theoretical straight line reasonably well through the central portion of the distribution. Minor deviations in the tails are within expected sampling variability. The marginal distribution is approximately normal, even though the data are not random.
Autocorrelation Plot
The autocorrelation plot reveals the periodic structure in the data. With 95% confidence bands at , any lag exceeding these bounds indicates significant autocorrelation.
The autocorrelation function shows significant positive autocorrelation at multiple lags with a clear periodic pattern. The slow, oscillatory decay confirms the presence of a cyclic component in the data rather than simple drift.
Spectral Plot
The spectral plot complements the autocorrelation plot by showing the frequency-domain structure. For the beam deflection data, it reveals the dominant frequency of the sinusoidal component.
The spectral plot shows a dominant peak near frequency 0.3 cycles per observation, identifying the period of the sinusoidal component. This frequency information is essential for developing a better model.
Quantitative Output and Interpretation
Summary Statistics
| Statistic | Value |
|---|---|
| Sample size | 200 |
| Mean | −177.435 |
| Std Dev | 277.332 |
| Median | −162.0 |
| Min | −579.0 |
| Max | 300.0 |
| Range | 879.0 |
The mean and median are reasonably close, consistent with the approximate symmetry observed in the histogram. The large standard deviation relative to the mean reflects the wide oscillation of the deflection measurements around the central value.
Location Test
The location test fits a linear regression of the response against the run-order index and tests whether the slope is significantly different from zero.
| Parameter | Estimate | Std Error | t-Value |
|---|---|---|---|
| (intercept) | −178.175 | 39.47 | −4.514 |
| (slope) | 0.7366E-02 | 0.34 | 0.022 |
Residual standard deviation: 278.0313 with 198 degrees of freedom.
Conclusion: The slope t-value of 0.022 is far below the critical value , so we fail to reject — the location is constant. The overall mean is stable at approximately −177 across the full dataset. The fixed-location assumption is satisfied.
Variation Test
The Levene test (median-based variant) divides the data into equal-length intervals and tests whether their variances are homogeneous.
| Statistic | Value |
|---|---|
| Test statistic | 0.09378 |
| Degrees of freedom | and |
| Significance level | 0.05 |
| Critical value | 2.651 |
Conclusion: The test statistic of 0.09378 is far below the critical value of 2.651, so we fail to reject — the variances are consistent across all four intervals. The fixed-variation assumption is satisfied.
Randomness Tests
Two complementary tests assess whether the observations are independent.
Runs test — tests whether the sequence of values above and below the median was produced randomly.
| Statistic | Value |
|---|---|
| Test statistic | 2.6938 |
| Significance level | 0.05 |
| Critical value | 1.96 |
Conclusion: exceeds 1.96, so we reject — the data are not random. The positive Z indicates fewer runs than expected, meaning the data cluster in long sequences above or below the median, consistent with the periodic structure visible in the run sequence plot.
Lag-1 autocorrelation — measures the linear dependence between consecutive observations.
| Statistic | Value |
|---|---|
| Critical value |
The autocorrelation plot reveals significant positive autocorrelation at multiple lags, with values well exceeding the significance bounds of . The autocorrelation function decays slowly and shows periodic structure, confirming the presence of a cyclic component. A spectral plot shows a dominant peak at frequency 0.3, identifying the frequency of the sinusoidal component. The randomness assumption fails.
Distribution and Outlier Tests
Since the randomness assumption is violated, distributional tests are not meaningful. The NIST handbook explicitly omits the Anderson-Darling test and skewness and kurtosis analysis for this case study, noting that these quantitative tests are not meaningful when the data are not independent.
Outliers are identified graphically from the lag plot rather than by formal testing. The lag plot reveals a few observations that fall well outside the main elliptical cloud, corresponding to potential outliers that warrant investigation.
Test Summary
| Assumption | Test | Statistic | Critical Value | Result |
|---|---|---|---|---|
| Fixed location | Regression on run order | 1.96 | Fail to reject | |
| Fixed variation | Levene test | 2.651 | Fail to reject | |
| Randomness | Runs test | 1.96 | Reject | |
| Randomness | Autocorrelation | Exceeds bounds | Reject | |
| Distribution | — | — | — | Not meaningful |
| Outliers | — | — | — | Not meaningful |
Two of four assumptions hold: fixed location and fixed variation. The randomness assumption is rejected, which invalidates distributional testing. The univariate model is not appropriate for this data due to the periodic structure.
Develop a Better Model
The autocorrelation and spectral analysis reveal that the non-randomness in the beam deflection data is periodic (sinusoidal) rather than drift-based. The spectral plot identifies a dominant frequency near 0.3 cycles per observation. This motivates a sinusoidal model:
where is the dominant frequency identified by spectral analysis. Fitting this model by least squares yields residuals that should satisfy all four assumptions if the sinusoidal component has been successfully removed.
The dominant frequency was determined from the spectral plot, which shows a single prominent peak. NIST used complex demodulation to refine the starting frequency estimate before applying nonlinear least squares regression to fit the four-parameter sinusoidal model simultaneously.
Model Parameters
The nonlinear regression produces the following parameter estimates:
| Parameter | Estimate | Std Error | t-Value |
|---|---|---|---|
| (constant) | 11.02 | ||
| (amplitude) | 26.19 | ||
| (frequency) | 0.0001510 | ||
| (phase) | 0.04909 |
Residual standard deviation: with 196 degrees of freedom.
All four parameters are highly significant. The frequency t-value of reflects the fact that the sinusoidal period is very precisely determined by the 200 observations. The residual standard deviation of represents a 44% reduction from the original data’s standard deviation of , confirming that the sinusoidal model captures a substantial portion of the data’s variability.
NIST notes that three potential outliers are present in the residuals and that removing them would reduce the residual standard deviation by approximately 5% (from 155.8 to 148.3). However, they conclude that retaining or removing the outliers is “a judgment call” — the full-data fit is presented here as the primary result.
Validate New Model
4-Plot of Residuals
The 4-plot applied to the residuals from the sinusoidal model tests whether the error term satisfies all four assumptions.
The transformation compared to the original data’s 4-plot is dramatic:
- Run sequence plot (upper left) — the oscillatory pattern is gone; residuals fluctuate randomly around zero with no systematic structure
- Lag plot (upper right) — the elliptical dependence structure is replaced by a random, structureless cloud
- Histogram (lower left) — approximately symmetric and bell-shaped, consistent with normality
- Normal probability plot (lower right) — approximately linear with minor curvature in the left tail, suggesting the residuals are close to but not perfectly normal
Residual Run Sequence Plot
The residuals fluctuate randomly around zero with no oscillatory pattern, no trend, and no visible shifts in variability — the fixed-location and fixed-variation assumptions are satisfied.
Residual Lag Plot
The residual lag plot shows a random, structureless cloud replacing the tight elliptical pattern of the original data — the randomness assumption is satisfied. The sinusoidal model has successfully removed the periodic dependence.
Residual Histogram
The residual histogram is approximately symmetric and bell-shaped, centered near zero. The shape is consistent with a normal distribution, though a slight leftward tail asymmetry is visible.
Residual Normal Probability Plot
The normal probability plot of residuals is approximately linear through the central portion. A bend in the left tail suggests some departure from normality — NIST notes this as “some cause for concern” but concludes the fit is “reasonably good.” The distribution is approximately normal.
Residual Autocorrelation Plot
The residual autocorrelation is dramatically reduced compared to the original data. Most lags fall within the 95% confidence bands of , confirming that the sinusoidal model has successfully removed the periodic dependence structure. The randomness assumption is satisfied.
Residual Spectral Plot
The residual spectral plot shows a relatively flat spectrum with no dominant peaks. The prominent peak near frequency 0.3 in the original data’s spectral plot has been eliminated, confirming that the sinusoidal model has captured the periodic component.
Validation Summary
| Assumption | Original Data | Residuals | Improvement |
|---|---|---|---|
| Fixed location | Satisfied () | Satisfied — no shifts in residual location | Maintained |
| Fixed variation | Satisfied () | Satisfied — stable residual spread | Maintained |
| Randomness | Rejected () | Satisfied — no significant autocorrelation | Restored |
| Distribution | Not tested (randomness violated) | Approximately normal (minor left-tail departure) | Now testable |
| Outliers | Not tested (randomness violated) | 3 potential outliers (NIST) | Now assessable |
The sinusoidal model restores the randomness assumption while maintaining the satisfied location and variation assumptions. Distribution testing, which was not meaningful for the original data due to the randomness violation, now shows the residuals are approximately normally distributed. The overall fit is “reasonably good” per NIST’s assessment, with the caveat that a slight left-tail departure from normality and three potential outliers warrant consideration.
Interpretation
The graphical and quantitative analyses of the original beam deflection data reveal a clear pattern: while the run sequence plot shows stable location and variation (confirmed by the location test with and Levene test with ), the randomness assumption is severely violated. The lag plot displays an elliptical structure indicating strong positive autocorrelation, the autocorrelation plot confirms significant dependence at multiple lags with a periodic decay pattern, and the spectral plot identifies the dominant frequency at approximately cycles per observation. Because randomness is violated, the distribution and outlier tests are not meaningful, and the simple univariate model is rejected.
The sinusoidal model captures the periodic structure with highly significant parameter estimates: constant , amplitude , frequency , and phase . The model achieves a 44% reduction in residual standard deviation (from to ), and the residual diagnostics confirm that all four assumptions are now satisfied: the residual run sequence plot shows no systematic pattern, the residual lag plot shows a structureless cloud, and the residual autocorrelation plot falls within the 95% confidence bands. The normal probability plot of residuals is approximately linear with a minor left-tail departure.
The practical implication is significant: the standard confidence interval computed from the original data is invalid because the randomness violation inflates the effective sample size. After fitting the sinusoidal model, the residuals satisfy the independence assumption, and valid confidence intervals can be constructed using the residual standard deviation of rather than the inflated . This case study demonstrates that the autocorrelation plot and spectral plot are indispensable for detecting periodic structure that is invisible to location and variation diagnostics.
Conclusions
Two of four assumptions hold for the original data: fixed location and fixed variation. The randomness assumption is severely violated — the autocorrelation plot and spectral plot reveal significant periodic (sinusoidal) structure with a dominant frequency near cycles per observation. Because randomness fails, distributional testing is not meaningful, and the univariate model is not appropriate.
The sinusoidal model with parameters , , , and captures the periodic structure, reducing the residual standard deviation by 44%. Residual diagnostics confirm that all four assumptions are satisfied for the model residuals, validating the sinusoidal fit. This case study demonstrates that summary statistics and distributional plots can appear perfectly acceptable while masking serious departures from the independence assumption, making the autocorrelation plot and spectral analysis indispensable tools in any thorough EDA analysis.