Beam Deflections Case Study

NIST/SEMATECH Section 1.4.2.5 Beam Deflections

Background and Data

This data set was collected by H. S. Lew of NIST in 1969 to measure steel-concrete beam deflections. The response variable is the deflection of a beam from the center point. This case study applies exploratory data analysis to the NIST LEW.DAT dataset, which contains 200 univariate measurements of beam deflection. The primary purpose is to demonstrate how EDA techniques detect periodic (sinusoidal) structure in what might initially appear to be a random process.

The dataset originates from NIST/SEMATECH Section 1.4.2.5. With $n = 200$ observations, this study illustrates the critical importance of testing the randomness assumption, as the data exhibit cyclic behavior that is invisible to simple location or spread summaries but readily detected by autocorrelation and spectral analysis.

Dataset

LEW.DAT

Observations: 200

Variable: Beam deflection

H. S. Lew, NBS, steel-concrete beam deflection (1969)

NIST source description

Steel-concrete beam deflection. H. S. Lew (NBS Center for Building Technology), 1969. Response variable = deflection of beam from center point. Number of observations = 200.

Preview data

#	Value
1	-213
2	-564
3	-35
4	-15
5	141
6	115
7	-420
8	-360
9	203
10	-338
... 190 more rows

Download CSV NIST Source

Test Underlying Assumptions

Goals

The analysis has three primary objectives:

Model validation — assess whether the univariate model is an appropriate fit for the beam deflection data:

Y_i = C + E_i

Assumption testing — evaluate whether the data satisfy the four standard assumptions for a measurement process in statistical control:
- Random sampling — the data are uncorrelated
- Fixed distribution — the data come from a fixed distribution
- Fixed location — the distribution location (mean) is constant
- Fixed variation — the distribution scale (standard deviation) is constant
Confidence interval validity — determine whether the standard confidence interval formula is appropriate:

\bar{Y} \pm \frac{2s}{\sqrt{N}}

where $s$ is the standard deviation. This formula relies on all four assumptions holding; if they are violated, the confidence interval has no statistical meaning.

If the assumptions are violated, identify the nature and severity of the violations and recommend appropriate remedial actions.

Graphical Output and Interpretation

4-Plot Overview

The 4-plot provides a mixed picture. The run sequence plot shows an oscillatory pattern around a stable mean, suggesting periodic structure. The lag plot shows an elliptical pattern indicating positive autocorrelation. The histogram is roughly symmetric and bell-shaped. The normal probability plot is approximately linear. The dominant finding is the failure of the randomness assumption due to cyclic behavior.

Four-plot diagnostic layout for the beam deflection dataset (run sequence, lag, histogram, normal probability).

The assumptions are addressed by the four diagnostic plots:

The run sequence plot (upper left) shows an oscillatory pattern around a stable mean of approximately −177, suggesting periodic structure rather than random scatter — the location is stable on average but the data are clearly not random.
The lag plot (upper right) shows an elliptical structure oriented along the diagonal, indicating strong positive autocorrelation — the randomness assumption is seriously violated.
When the randomness assumption is thus seriously violated, the histogram (lower left) and normal probability plot (lower right) are less meaningful since determining the distribution is only valid when the data are random. Nevertheless, both appear roughly consistent with normality.

From the above plots we conclude that the underlying randomness assumption is not valid. The model $Y_i = C + E_i$ is not appropriate — a model that accounts for the periodic structure is needed.

Run Sequence Plot

The run sequence plot shows 200 observations fluctuating around a stable mean of approximately −177. Rather than random scatter, the data exhibit a clear oscillatory (sinusoidal) pattern. The location is stable on average, and the variation appears constant, but the sequential ordering reveals periodic structure that would be missed by looking at summary statistics alone.

Run sequence plot of beam deflection data showing an oscillatory pattern around a mean of approximately -177 over 200 observations.

Lag Plot

The lag plot at lag 1 displays an elliptical structure oriented along the diagonal, indicating strong positive autocorrelation. Consecutive observations are correlated — when one value is high, the next tends to be high as well. This elliptical pattern is the signature of a positively autocorrelated process and contrasts sharply with the circular scatter expected from independent data.

Lag-1 plot showing an elliptical structure oriented along the diagonal, indicating strong positive autocorrelation from periodic structure.

Histogram

The histogram is roughly symmetric and bell-shaped, centered near −177. The shape is consistent with an approximately normal distribution. The histogram alone gives no indication of the underlying periodic structure because it discards the sequential ordering of observations.

Histogram with KDE overlay of beam deflection data. The roughly symmetric, bell-shaped distribution masks the underlying periodic structure.

Normal Probability Plot

The normal probability plot is approximately linear, with data points following the theoretical straight line reasonably well through the central portion of the distribution. Minor deviations in the tails are within expected sampling variability. The marginal distribution is approximately normal, even though the data are not random.

Normal probability plot of beam deflection data. The approximately linear pattern indicates the marginal distribution is close to normal despite the violated randomness assumption.

Autocorrelation Plot

The autocorrelation plot reveals the periodic structure in the data. With 95% confidence bands at $\pm 2/\sqrt{N} = \pm 0.1414$ , any lag exceeding these bounds indicates significant autocorrelation.

Autocorrelation plot showing significant positive autocorrelation at multiple lags with clear periodic structure — the slow oscillatory decay confirms a cyclic component.

The autocorrelation function shows significant positive autocorrelation at multiple lags with a clear periodic pattern. The slow, oscillatory decay confirms the presence of a cyclic component in the data rather than simple drift.

Spectral Plot

The spectral plot complements the autocorrelation plot by showing the frequency-domain structure. For the beam deflection data, it reveals the dominant frequency of the sinusoidal component.

Spectral plot showing a dominant peak near frequency 0.3 cycles per observation, identifying the period of the sinusoidal component in the beam deflection data.

The spectral plot shows a dominant peak near frequency 0.3 cycles per observation, identifying the period of the sinusoidal component. This frequency information is essential for developing a better model.

Quantitative Output and Interpretation

Summary Statistics

Statistic	Value
Sample size $n$	200
Mean $\bar{Y}$	−177.435
Std Dev $s$	277.332
Median	−162.0
Min	−579.0
Max	300.0
Range	879.0

The mean and median are reasonably close, consistent with the approximate symmetry observed in the histogram. The large standard deviation relative to the mean reflects the wide oscillation of the deflection measurements around the central value.

Location Test

The location test fits a linear regression of the response $Y$ against the run-order index $X = 1, 2, \ldots, N$ and tests whether the slope is significantly different from zero.

H_0\!: B_1 = 0 \quad \text{vs.} \quad H_a\!: B_1 \neq 0

Parameter	Estimate	Std Error	t-Value
$B_0$ (intercept)	−178.175	39.47	−4.514
$B_1$ (slope)	0.7366E-02	0.34	0.022

Residual standard deviation: 278.0313 with 198 degrees of freedom.

Conclusion: The slope t-value of 0.022 is far below the critical value $t_{0.975,\,198} = 1.96$ , so we fail to reject $H_0$ — the location is constant. The overall mean is stable at approximately −177 across the full dataset. The fixed-location assumption is satisfied.

Variation Test

The Levene test (median-based variant) divides the data into $k = 4$ equal-length intervals and tests whether their variances are homogeneous.

H_0\!: \sigma_1^2 = \sigma_2^2 = \sigma_3^2 = \sigma_4^2 \quad \text{vs.} \quad H_a\!: \text{at least one } \sigma_i^2 \text{ differs}

Statistic	Value
Test statistic $W$	0.09378
Degrees of freedom	$k - 1 = 3$ and $N - k = 196$
Significance level $\alpha$	0.05
Critical value $F_{0.05,\,3,\,196}$	2.651

Conclusion: The test statistic of 0.09378 is far below the critical value of 2.651, so we fail to reject $H_0$ — the variances are consistent across all four intervals. The fixed-variation assumption is satisfied.

Randomness Tests

Two complementary tests assess whether the observations are independent.

Runs test — tests whether the sequence of values above and below the median was produced randomly.

H_0\!: \text{sequence is random} \quad \text{vs.} \quad H_a\!: \text{sequence is not random}

Statistic	Value
Test statistic $Z$	2.6938
Significance level $\alpha$	0.05
Critical value $Z_{1-\alpha/2}$	1.96

Conclusion: $|Z| = 2.6938$ exceeds 1.96, so we reject $H_0$ — the data are not random. The positive Z indicates fewer runs than expected, meaning the data cluster in long sequences above or below the median, consistent with the periodic structure visible in the run sequence plot.

Lag-1 autocorrelation — measures the linear dependence between consecutive observations.

Statistic	Value
Critical value $2/\sqrt{N}$	$2/\sqrt{200} = 0.1414$

The autocorrelation plot reveals significant positive autocorrelation at multiple lags, with values well exceeding the significance bounds of $\pm 0.1414$ . The autocorrelation function decays slowly and shows periodic structure, confirming the presence of a cyclic component. A spectral plot shows a dominant peak at frequency 0.3, identifying the frequency of the sinusoidal component. The randomness assumption fails.

Distribution and Outlier Tests

Since the randomness assumption is violated, distributional tests are not meaningful. The NIST handbook explicitly omits the Anderson-Darling test and skewness and kurtosis analysis for this case study, noting that these quantitative tests are not meaningful when the data are not independent.

Outliers are identified graphically from the lag plot rather than by formal testing. The lag plot reveals a few observations that fall well outside the main elliptical cloud, corresponding to potential outliers that warrant investigation.

Test Summary

Assumption	Test	Statistic	Critical Value	Result
Fixed location	Regression on run order	$t = 0.022$	1.96	Fail to reject
Fixed variation	Levene test	$W = 0.09378$	2.651	Fail to reject
Randomness	Runs test	$Z = 2.6938$	1.96	Reject
Randomness	Autocorrelation	Exceeds bounds	$\pm 0.1414$	Reject
Distribution	—	—	—	Not meaningful
Outliers	—	—	—	Not meaningful

Two of four assumptions hold: fixed location and fixed variation. The randomness assumption is rejected, which invalidates distributional testing. The univariate model $Y_i = C + E_i$ is not appropriate for this data due to the periodic structure.

Develop a Better Model

The autocorrelation and spectral analysis reveal that the non-randomness in the beam deflection data is periodic (sinusoidal) rather than drift-based. The spectral plot identifies a dominant frequency near 0.3 cycles per observation. This motivates a sinusoidal model:

Y_i = B_0 + B_1 \sin(2\pi f t_i + \phi) + E_i

where $f \approx 0.3025$ is the dominant frequency identified by spectral analysis. Fitting this model by least squares yields residuals that should satisfy all four assumptions if the sinusoidal component has been successfully removed.

The dominant frequency $f \approx 0.3025$ was determined from the spectral plot, which shows a single prominent peak. NIST used complex demodulation to refine the starting frequency estimate before applying nonlinear least squares regression to fit the four-parameter sinusoidal model simultaneously.

Model Parameters

The nonlinear regression produces the following parameter estimates:

Parameter	Estimate	Std Error	t-Value
$C$ (constant)	$-178.786$	11.02	$-16.22$
$\alpha$ (amplitude)	$-361.766$	26.19	$-13.81$
$\omega$ (frequency)	$0.302596$	0.0001510	$2005$
$\phi$ (phase)	$1.46536$	0.04909	$29.85$

Residual standard deviation: $s_e = 155.8484$ with 196 degrees of freedom.

All four parameters are highly significant. The frequency t-value of $2005$ reflects the fact that the sinusoidal period is very precisely determined by the 200 observations. The residual standard deviation of $155.8$ represents a 44% reduction from the original data’s standard deviation of $277.3$ , confirming that the sinusoidal model captures a substantial portion of the data’s variability.

NIST notes that three potential outliers are present in the residuals and that removing them would reduce the residual standard deviation by approximately 5% (from 155.8 to 148.3). However, they conclude that retaining or removing the outliers is “a judgment call” — the full-data fit is presented here as the primary result.

Validate New Model

4-Plot of Residuals

The 4-plot applied to the residuals from the sinusoidal model tests whether the error term satisfies all four assumptions.

Four-plot diagnostic layout for sinusoidal model residuals — all four assumptions should be satisfied after removing the periodic structure.

The transformation compared to the original data’s 4-plot is dramatic:

Run sequence plot (upper left) — the oscillatory pattern is gone; residuals fluctuate randomly around zero with no systematic structure
Lag plot (upper right) — the elliptical dependence structure is replaced by a random, structureless cloud
Histogram (lower left) — approximately symmetric and bell-shaped, consistent with normality
Normal probability plot (lower right) — approximately linear with minor curvature in the left tail, suggesting the residuals are close to but not perfectly normal

Residual Run Sequence Plot

Run sequence plot of sinusoidal model residuals showing random scatter around zero with no oscillatory pattern.

The residuals fluctuate randomly around zero with no oscillatory pattern, no trend, and no visible shifts in variability — the fixed-location and fixed-variation assumptions are satisfied.

Residual Lag Plot

Lag-1 plot of sinusoidal model residuals showing a random, structureless cloud — the periodic dependence has been removed.

The residual lag plot shows a random, structureless cloud replacing the tight elliptical pattern of the original data — the randomness assumption is satisfied. The sinusoidal model has successfully removed the periodic dependence.

Residual Histogram

Histogram of sinusoidal model residuals showing an approximately symmetric, bell-shaped distribution.

The residual histogram is approximately symmetric and bell-shaped, centered near zero. The shape is consistent with a normal distribution, though a slight leftward tail asymmetry is visible.

Residual Normal Probability Plot

Normal probability plot of sinusoidal model residuals. Approximate linearity indicates the residuals are consistent with normality.

The normal probability plot of residuals is approximately linear through the central portion. A bend in the left tail suggests some departure from normality — NIST notes this as “some cause for concern” but concludes the fit is “reasonably good.” The distribution is approximately normal.

Residual Autocorrelation Plot

Autocorrelation plot of sinusoidal model residuals — dramatically reduced autocorrelation compared to the original data, with most lags within the 95% confidence bands.

The residual autocorrelation is dramatically reduced compared to the original data. Most lags fall within the 95% confidence bands of $\pm 0.1414$ , confirming that the sinusoidal model has successfully removed the periodic dependence structure. The randomness assumption is satisfied.

Residual Spectral Plot

Spectral plot of sinusoidal model residuals -- a flat spectrum with no dominant peaks confirms the periodic structure has been successfully removed.

The residual spectral plot shows a relatively flat spectrum with no dominant peaks. The prominent peak near frequency 0.3 in the original data’s spectral plot has been eliminated, confirming that the sinusoidal model has captured the periodic component.

Validation Summary

Assumption	Original Data	Residuals	Improvement
Fixed location	Satisfied ( $t = 0.022$ )	Satisfied — no shifts in residual location	Maintained
Fixed variation	Satisfied ( $W = 0.09378$ )	Satisfied — stable residual spread	Maintained
Randomness	Rejected ( $Z = 2.6938$ )	Satisfied — no significant autocorrelation	Restored
Distribution	Not tested (randomness violated)	Approximately normal (minor left-tail departure)	Now testable
Outliers	Not tested (randomness violated)	3 potential outliers (NIST)	Now assessable

The sinusoidal model restores the randomness assumption while maintaining the satisfied location and variation assumptions. Distribution testing, which was not meaningful for the original data due to the randomness violation, now shows the residuals are approximately normally distributed. The overall fit is “reasonably good” per NIST’s assessment, with the caveat that a slight left-tail departure from normality and three potential outliers warrant consideration.

Interpretation

The graphical and quantitative analyses of the original beam deflection data reveal a clear pattern: while the run sequence plot shows stable location and variation (confirmed by the location test with $t = 0.022$ and Levene test with $W = 0.09378$ ), the randomness assumption is severely violated. The lag plot displays an elliptical structure indicating strong positive autocorrelation, the autocorrelation plot confirms significant dependence at multiple lags with a periodic decay pattern, and the spectral plot identifies the dominant frequency at approximately $0.3$ cycles per observation. Because randomness is violated, the distribution and outlier tests are not meaningful, and the simple univariate model $Y_i = C + E_i$ is rejected.

The sinusoidal model $Y_i = C + \alpha\sin(2\pi\omega t_i + \phi) + E_i$ captures the periodic structure with highly significant parameter estimates: constant $C = -178.786$ , amplitude $\alpha = -361.766$ , frequency $\omega = 0.302596$ , and phase $\phi = 1.46536$ . The model achieves a 44% reduction in residual standard deviation (from $277.3$ to $155.8$ ), and the residual diagnostics confirm that all four assumptions are now satisfied: the residual run sequence plot shows no systematic pattern, the residual lag plot shows a structureless cloud, and the residual autocorrelation plot falls within the 95% confidence bands. The normal probability plot of residuals is approximately linear with a minor left-tail departure.

The practical implication is significant: the standard confidence interval $\bar{Y} \pm 2s/\sqrt{N}$ computed from the original data is invalid because the randomness violation inflates the effective sample size. After fitting the sinusoidal model, the residuals satisfy the independence assumption, and valid confidence intervals can be constructed using the residual standard deviation of $155.8$ rather than the inflated $277.3$ . This case study demonstrates that the autocorrelation plot and spectral plot are indispensable for detecting periodic structure that is invisible to location and variation diagnostics.

Conclusions

Two of four assumptions hold for the original data: fixed location and fixed variation. The randomness assumption is severely violated — the autocorrelation plot and spectral plot reveal significant periodic (sinusoidal) structure with a dominant frequency near $0.3$ cycles per observation. Because randomness fails, distributional testing is not meaningful, and the univariate model $Y_i = C + E_i$ is not appropriate.

The sinusoidal model with parameters $C = -178.786$ , $\alpha = -361.766$ , $\omega = 0.302596$ , and $\phi = 1.46536$ captures the periodic structure, reducing the residual standard deviation by 44%. Residual diagnostics confirm that all four assumptions are satisfied for the model residuals, validating the sinusoidal fit. This case study demonstrates that summary statistics and distributional plots can appear perfectly acceptable while masking serious departures from the independence assumption, making the autocorrelation plot and spectral analysis indispensable tools in any thorough EDA analysis.