Skip to main content

Beam Deflections Case Study

NIST/SEMATECH Section 1.4.2.5 Beam Deflections

Background and Data

This data set was collected by H. S. Lew of NIST in 1969 to measure steel-concrete beam deflections. The response variable is the deflection of a beam from the center point. This case study applies exploratory data analysis to the NIST LEW.DAT dataset, which contains 200 univariate measurements of beam deflection. The primary purpose is to demonstrate how EDA techniques detect periodic (sinusoidal) structure in what might initially appear to be a random process.

The dataset originates from NIST/SEMATECH Section 1.4.2.5. With n=200n = 200 observations, this study illustrates the critical importance of testing the randomness assumption, as the data exhibit cyclic behavior that is invisible to simple location or spread summaries but readily detected by autocorrelation and spectral analysis.

Dataset

LEW.DAT
Observations: 200
Variable: Beam deflection

H. S. Lew, NBS, steel-concrete beam deflection (1969)

NIST source description
Steel-concrete beam deflection. H. S. Lew (NBS Center for Building Technology), 1969. Response variable = deflection of beam from center point. Number of observations = 200.
Preview data
# Value
1 -213
2 -564
3 -35
4 -15
5 141
6 115
7 -420
8 -360
9 203
10 -338
... 190 more rows

Test Underlying Assumptions

Goals

The analysis has three primary objectives:

  1. Model validation — assess whether the univariate model is an appropriate fit for the beam deflection data:
Yi=C+EiY_i = C + E_i
  1. Assumption testing — evaluate whether the data satisfy the four standard assumptions for a measurement process in statistical control:

    • Random sampling — the data are uncorrelated
    • Fixed distribution — the data come from a fixed distribution
    • Fixed location — the distribution location (mean) is constant
    • Fixed variation — the distribution scale (standard deviation) is constant
  2. Confidence interval validity — determine whether the standard confidence interval formula is appropriate:

Yˉ±2sN\bar{Y} \pm \frac{2s}{\sqrt{N}}

where ss is the standard deviation. This formula relies on all four assumptions holding; if they are violated, the confidence interval has no statistical meaning.

If the assumptions are violated, identify the nature and severity of the violations and recommend appropriate remedial actions.

Graphical Output and Interpretation

4-Plot Overview

The 4-plot provides a mixed picture. The run sequence plot shows an oscillatory pattern around a stable mean, suggesting periodic structure. The lag plot shows an elliptical pattern indicating positive autocorrelation. The histogram is roughly symmetric and bell-shaped. The normal probability plot is approximately linear. The dominant finding is the failure of the randomness assumption due to cyclic behavior.

Four-plot diagnostic layout for the beam deflection dataset (run sequence, lag, histogram, normal probability).

The assumptions are addressed by the four diagnostic plots:

  1. The run sequence plot (upper left) shows an oscillatory pattern around a stable mean of approximately −177, suggesting periodic structure rather than random scatter — the location is stable on average but the data are clearly not random.
  2. The lag plot (upper right) shows an elliptical structure oriented along the diagonal, indicating strong positive autocorrelation — the randomness assumption is seriously violated.
  3. When the randomness assumption is thus seriously violated, the histogram (lower left) and normal probability plot (lower right) are less meaningful since determining the distribution is only valid when the data are random. Nevertheless, both appear roughly consistent with normality.

From the above plots we conclude that the underlying randomness assumption is not valid. The model Yi=C+EiY_i = C + E_i is not appropriate — a model that accounts for the periodic structure is needed.

Run Sequence Plot

The run sequence plot shows 200 observations fluctuating around a stable mean of approximately −177. Rather than random scatter, the data exhibit a clear oscillatory (sinusoidal) pattern. The location is stable on average, and the variation appears constant, but the sequential ordering reveals periodic structure that would be missed by looking at summary statistics alone.

Run sequence plot of beam deflection data showing an oscillatory pattern around a mean of approximately -177 over 200 observations.

Lag Plot

The lag plot at lag 1 displays an elliptical structure oriented along the diagonal, indicating strong positive autocorrelation. Consecutive observations are correlated — when one value is high, the next tends to be high as well. This elliptical pattern is the signature of a positively autocorrelated process and contrasts sharply with the circular scatter expected from independent data.

Lag-1 plot showing an elliptical structure oriented along the diagonal, indicating strong positive autocorrelation from periodic structure.

Histogram

The histogram is roughly symmetric and bell-shaped, centered near −177. The shape is consistent with an approximately normal distribution. The histogram alone gives no indication of the underlying periodic structure because it discards the sequential ordering of observations.

Histogram with KDE overlay of beam deflection data. The roughly symmetric, bell-shaped distribution masks the underlying periodic structure.

Normal Probability Plot

The normal probability plot is approximately linear, with data points following the theoretical straight line reasonably well through the central portion of the distribution. Minor deviations in the tails are within expected sampling variability. The marginal distribution is approximately normal, even though the data are not random.

Normal probability plot of beam deflection data. The approximately linear pattern indicates the marginal distribution is close to normal despite the violated randomness assumption.

Autocorrelation Plot

The autocorrelation plot reveals the periodic structure in the data. With 95% confidence bands at ±2/N=±0.1414\pm 2/\sqrt{N} = \pm 0.1414, any lag exceeding these bounds indicates significant autocorrelation.

Autocorrelation plot showing significant positive autocorrelation at multiple lags with clear periodic structure — the slow oscillatory decay confirms a cyclic component.

The autocorrelation function shows significant positive autocorrelation at multiple lags with a clear periodic pattern. The slow, oscillatory decay confirms the presence of a cyclic component in the data rather than simple drift.

Spectral Plot

The spectral plot complements the autocorrelation plot by showing the frequency-domain structure. For the beam deflection data, it reveals the dominant frequency of the sinusoidal component.

Spectral plot showing a dominant peak near frequency 0.3 cycles per observation, identifying the period of the sinusoidal component in the beam deflection data.

The spectral plot shows a dominant peak near frequency 0.3 cycles per observation, identifying the period of the sinusoidal component. This frequency information is essential for developing a better model.

Quantitative Output and Interpretation

Summary Statistics

StatisticValue
Sample size nn200
Mean Yˉ\bar{Y}−177.435
Std Dev ss277.332
Median−162.0
Min−579.0
Max300.0
Range879.0

The mean and median are reasonably close, consistent with the approximate symmetry observed in the histogram. The large standard deviation relative to the mean reflects the wide oscillation of the deflection measurements around the central value.

Location Test

The location test fits a linear regression of the response YY against the run-order index X=1,2,,NX = 1, 2, \ldots, N and tests whether the slope is significantly different from zero.

H0 ⁣:B1=0vs.Ha ⁣:B10H_0\!: B_1 = 0 \quad \text{vs.} \quad H_a\!: B_1 \neq 0
ParameterEstimateStd Errort-Value
B0B_0 (intercept)−178.17539.47−4.514
B1B_1 (slope)0.7366E-020.340.022

Residual standard deviation: 278.0313 with 198 degrees of freedom.

Conclusion: The slope t-value of 0.022 is far below the critical value t0.975,198=1.96t_{0.975,\,198} = 1.96, so we fail to reject H0H_0 — the location is constant. The overall mean is stable at approximately −177 across the full dataset. The fixed-location assumption is satisfied.

Variation Test

The Levene test (median-based variant) divides the data into k=4k = 4 equal-length intervals and tests whether their variances are homogeneous.

H0 ⁣:σ12=σ22=σ32=σ42vs.Ha ⁣:at least one σi2 differsH_0\!: \sigma_1^2 = \sigma_2^2 = \sigma_3^2 = \sigma_4^2 \quad \text{vs.} \quad H_a\!: \text{at least one } \sigma_i^2 \text{ differs}
StatisticValue
Test statistic WW0.09378
Degrees of freedomk1=3k - 1 = 3 and Nk=196N - k = 196
Significance level α\alpha0.05
Critical value F0.05,3,196F_{0.05,\,3,\,196}2.651

Conclusion: The test statistic of 0.09378 is far below the critical value of 2.651, so we fail to reject H0H_0 — the variances are consistent across all four intervals. The fixed-variation assumption is satisfied.

Randomness Tests

Two complementary tests assess whether the observations are independent.

Runs test — tests whether the sequence of values above and below the median was produced randomly.

H0 ⁣:sequence is randomvs.Ha ⁣:sequence is not randomH_0\!: \text{sequence is random} \quad \text{vs.} \quad H_a\!: \text{sequence is not random}
StatisticValue
Test statistic ZZ2.6938
Significance level α\alpha0.05
Critical value Z1α/2Z_{1-\alpha/2}1.96

Conclusion: Z=2.6938|Z| = 2.6938 exceeds 1.96, so we reject H0H_0 — the data are not random. The positive Z indicates fewer runs than expected, meaning the data cluster in long sequences above or below the median, consistent with the periodic structure visible in the run sequence plot.

Lag-1 autocorrelation — measures the linear dependence between consecutive observations.

StatisticValue
Critical value 2/N2/\sqrt{N}2/200=0.14142/\sqrt{200} = 0.1414

The autocorrelation plot reveals significant positive autocorrelation at multiple lags, with values well exceeding the significance bounds of ±0.1414\pm 0.1414. The autocorrelation function decays slowly and shows periodic structure, confirming the presence of a cyclic component. A spectral plot shows a dominant peak at frequency 0.3, identifying the frequency of the sinusoidal component. The randomness assumption fails.

Distribution and Outlier Tests

Since the randomness assumption is violated, distributional tests are not meaningful. The NIST handbook explicitly omits the Anderson-Darling test and skewness and kurtosis analysis for this case study, noting that these quantitative tests are not meaningful when the data are not independent.

Outliers are identified graphically from the lag plot rather than by formal testing. The lag plot reveals a few observations that fall well outside the main elliptical cloud, corresponding to potential outliers that warrant investigation.

Test Summary

AssumptionTestStatisticCritical ValueResult
Fixed locationRegression on run ordert=0.022t = 0.0221.96Fail to reject
Fixed variationLevene testW=0.09378W = 0.093782.651Fail to reject
RandomnessRuns testZ=2.6938Z = 2.69381.96Reject
RandomnessAutocorrelationExceeds bounds±0.1414\pm 0.1414Reject
DistributionNot meaningful
OutliersNot meaningful

Two of four assumptions hold: fixed location and fixed variation. The randomness assumption is rejected, which invalidates distributional testing. The univariate model Yi=C+EiY_i = C + E_i is not appropriate for this data due to the periodic structure.

Develop a Better Model

The autocorrelation and spectral analysis reveal that the non-randomness in the beam deflection data is periodic (sinusoidal) rather than drift-based. The spectral plot identifies a dominant frequency near 0.3 cycles per observation. This motivates a sinusoidal model:

Yi=B0+B1sin(2πfti+ϕ)+EiY_i = B_0 + B_1 \sin(2\pi f t_i + \phi) + E_i

where f0.3025f \approx 0.3025 is the dominant frequency identified by spectral analysis. Fitting this model by least squares yields residuals that should satisfy all four assumptions if the sinusoidal component has been successfully removed.

The dominant frequency f0.3025f \approx 0.3025 was determined from the spectral plot, which shows a single prominent peak. NIST used complex demodulation to refine the starting frequency estimate before applying nonlinear least squares regression to fit the four-parameter sinusoidal model simultaneously.

Model Parameters

The nonlinear regression produces the following parameter estimates:

ParameterEstimateStd Errort-Value
CC (constant)178.786-178.78611.0216.22-16.22
α\alpha (amplitude)361.766-361.76626.1913.81-13.81
ω\omega (frequency)0.3025960.3025960.000151020052005
ϕ\phi (phase)1.465361.465360.0490929.8529.85

Residual standard deviation: se=155.8484s_e = 155.8484 with 196 degrees of freedom.

All four parameters are highly significant. The frequency t-value of 20052005 reflects the fact that the sinusoidal period is very precisely determined by the 200 observations. The residual standard deviation of 155.8155.8 represents a 44% reduction from the original data’s standard deviation of 277.3277.3, confirming that the sinusoidal model captures a substantial portion of the data’s variability.

NIST notes that three potential outliers are present in the residuals and that removing them would reduce the residual standard deviation by approximately 5% (from 155.8 to 148.3). However, they conclude that retaining or removing the outliers is “a judgment call” — the full-data fit is presented here as the primary result.

Validate New Model

4-Plot of Residuals

The 4-plot applied to the residuals from the sinusoidal model tests whether the error term satisfies all four assumptions.

Four-plot diagnostic layout for sinusoidal model residuals — all four assumptions should be satisfied after removing the periodic structure.

The transformation compared to the original data’s 4-plot is dramatic:

  • Run sequence plot (upper left) — the oscillatory pattern is gone; residuals fluctuate randomly around zero with no systematic structure
  • Lag plot (upper right) — the elliptical dependence structure is replaced by a random, structureless cloud
  • Histogram (lower left) — approximately symmetric and bell-shaped, consistent with normality
  • Normal probability plot (lower right) — approximately linear with minor curvature in the left tail, suggesting the residuals are close to but not perfectly normal

Residual Run Sequence Plot

Run sequence plot of sinusoidal model residuals showing random scatter around zero with no oscillatory pattern.

The residuals fluctuate randomly around zero with no oscillatory pattern, no trend, and no visible shifts in variability — the fixed-location and fixed-variation assumptions are satisfied.

Residual Lag Plot

Lag-1 plot of sinusoidal model residuals showing a random, structureless cloud — the periodic dependence has been removed.

The residual lag plot shows a random, structureless cloud replacing the tight elliptical pattern of the original data — the randomness assumption is satisfied. The sinusoidal model has successfully removed the periodic dependence.

Residual Histogram

Histogram of sinusoidal model residuals showing an approximately symmetric, bell-shaped distribution.

The residual histogram is approximately symmetric and bell-shaped, centered near zero. The shape is consistent with a normal distribution, though a slight leftward tail asymmetry is visible.

Residual Normal Probability Plot

Normal probability plot of sinusoidal model residuals. Approximate linearity indicates the residuals are consistent with normality.

The normal probability plot of residuals is approximately linear through the central portion. A bend in the left tail suggests some departure from normality — NIST notes this as “some cause for concern” but concludes the fit is “reasonably good.” The distribution is approximately normal.

Residual Autocorrelation Plot

Autocorrelation plot of sinusoidal model residuals — dramatically reduced autocorrelation compared to the original data, with most lags within the 95% confidence bands.

The residual autocorrelation is dramatically reduced compared to the original data. Most lags fall within the 95% confidence bands of ±0.1414\pm 0.1414, confirming that the sinusoidal model has successfully removed the periodic dependence structure. The randomness assumption is satisfied.

Residual Spectral Plot

Spectral plot of sinusoidal model residuals -- a flat spectrum with no dominant peaks confirms the periodic structure has been successfully removed.

The residual spectral plot shows a relatively flat spectrum with no dominant peaks. The prominent peak near frequency 0.3 in the original data’s spectral plot has been eliminated, confirming that the sinusoidal model has captured the periodic component.

Validation Summary

AssumptionOriginal DataResidualsImprovement
Fixed locationSatisfied (t=0.022t = 0.022)Satisfied — no shifts in residual locationMaintained
Fixed variationSatisfied (W=0.09378W = 0.09378)Satisfied — stable residual spreadMaintained
RandomnessRejected (Z=2.6938Z = 2.6938)Satisfied — no significant autocorrelationRestored
DistributionNot tested (randomness violated)Approximately normal (minor left-tail departure)Now testable
OutliersNot tested (randomness violated)3 potential outliers (NIST)Now assessable

The sinusoidal model restores the randomness assumption while maintaining the satisfied location and variation assumptions. Distribution testing, which was not meaningful for the original data due to the randomness violation, now shows the residuals are approximately normally distributed. The overall fit is “reasonably good” per NIST’s assessment, with the caveat that a slight left-tail departure from normality and three potential outliers warrant consideration.

Interpretation

The graphical and quantitative analyses of the original beam deflection data reveal a clear pattern: while the run sequence plot shows stable location and variation (confirmed by the location test with t=0.022t = 0.022 and Levene test with W=0.09378W = 0.09378), the randomness assumption is severely violated. The lag plot displays an elliptical structure indicating strong positive autocorrelation, the autocorrelation plot confirms significant dependence at multiple lags with a periodic decay pattern, and the spectral plot identifies the dominant frequency at approximately 0.30.3 cycles per observation. Because randomness is violated, the distribution and outlier tests are not meaningful, and the simple univariate model Yi=C+EiY_i = C + E_i is rejected.

The sinusoidal model Yi=C+αsin(2πωti+ϕ)+EiY_i = C + \alpha\sin(2\pi\omega t_i + \phi) + E_i captures the periodic structure with highly significant parameter estimates: constant C=178.786C = -178.786, amplitude α=361.766\alpha = -361.766, frequency ω=0.302596\omega = 0.302596, and phase ϕ=1.46536\phi = 1.46536. The model achieves a 44% reduction in residual standard deviation (from 277.3277.3 to 155.8155.8), and the residual diagnostics confirm that all four assumptions are now satisfied: the residual run sequence plot shows no systematic pattern, the residual lag plot shows a structureless cloud, and the residual autocorrelation plot falls within the 95% confidence bands. The normal probability plot of residuals is approximately linear with a minor left-tail departure.

The practical implication is significant: the standard confidence interval Yˉ±2s/N\bar{Y} \pm 2s/\sqrt{N} computed from the original data is invalid because the randomness violation inflates the effective sample size. After fitting the sinusoidal model, the residuals satisfy the independence assumption, and valid confidence intervals can be constructed using the residual standard deviation of 155.8155.8 rather than the inflated 277.3277.3. This case study demonstrates that the autocorrelation plot and spectral plot are indispensable for detecting periodic structure that is invisible to location and variation diagnostics.

Conclusions

Two of four assumptions hold for the original data: fixed location and fixed variation. The randomness assumption is severely violated — the autocorrelation plot and spectral plot reveal significant periodic (sinusoidal) structure with a dominant frequency near 0.30.3 cycles per observation. Because randomness fails, distributional testing is not meaningful, and the univariate model Yi=C+EiY_i = C + E_i is not appropriate.

The sinusoidal model with parameters C=178.786C = -178.786, α=361.766\alpha = -361.766, ω=0.302596\omega = 0.302596, and ϕ=1.46536\phi = 1.46536 captures the periodic structure, reducing the residual standard deviation by 44%. Residual diagnostics confirm that all four assumptions are satisfied for the model residuals, validating the sinusoidal fit. This case study demonstrates that summary statistics and distributional plots can appear perfectly acceptable while masking serious departures from the independence assumption, making the autocorrelation plot and spectral analysis indispensable tools in any thorough EDA analysis.