Uniform Random Numbers Case Study
NIST/SEMATECH Section 1.4.2.2 Uniform Random Numbers
Background and Data
This case study applies exploratory data analysis to 500 observations drawn from a uniform distribution U(0,1), taken from a Rand Corporation publication. The purpose is to demonstrate how EDA graphical and quantitative techniques detect a non-normal underlying distribution while the remaining three assumptions — fixed location, fixed variation, and randomness — are all satisfied. This is the complementary case to the normal random numbers study.
The dataset originates from NIST/SEMATECH Section 1.4.2.2. With n = 500 uniformly distributed observations, the distribution assumption is the only one that fails. This case study is critically important because it illustrates the effects of a known underlying non-normal distribution and validates whether EDA techniques correctly identify departure from normality when the true distribution is known in advance.
Dataset
Rand Corporation, 500 uniform U(0,1) random numbers
NIST source description
Rand uniform numbers (based on combining 6 consecutive Rand uniform digits). Sample size N = 500. Reference program: MAIN.RANDU.
Preview data
| # | Value |
|---|---|
| 1 | 0.100973 |
| 2 | 0.253376 |
| 3 | 0.520135 |
| 4 | 0.863467 |
| 5 | 0.354876 |
| 6 | 0.80959 |
| 7 | 0.911739 |
| 8 | 0.292749 |
| 9 | 0.37542 |
| 10 | 0.480564 |
| ... 490 more rows | |
Test Underlying Assumptions
Goals
The analysis has three primary objectives:
- Model validation — assess whether the univariate model is an appropriate fit for the uniform random number data:
-
Assumption testing — evaluate whether the data satisfy the four standard assumptions for a measurement process in statistical control:
- Random sampling — the data are uncorrelated
- Fixed distribution — the data come from a fixed distribution
- Fixed location — the distribution location (mean) is constant
- Fixed variation — the distribution scale (standard deviation) is constant
-
Confidence interval validity — determine whether the standard confidence interval formula is appropriate:
where is the standard deviation. This formula relies on all four assumptions holding; if the distributional assumption is violated, the confidence interval may not have the correct coverage probability.
If any assumptions are violated, identify the nature of the violations and recommend appropriate remedial actions.
Graphical Output and Interpretation
4-Plot Overview
The 4-plot immediately reveals one anomaly: while the run sequence plot and lag plot appear healthy, the histogram is rectangular rather than bell-shaped, and the normal probability plot shows strong S-shaped curvature. Three of four assumptions hold; the distributional assumption does not.
The assumptions are addressed by the four diagnostic plots:
- The run sequence plot (upper left) shows data fluctuating randomly in a stable band between 0 and 1 with no trend or shifts — the fixed-location and fixed-variation assumptions appear satisfied.
- The lag plot (upper right) displays a structureless square cloud (the square shape, rather than circular, is characteristic of uniform data with bounded support) — the randomness assumption appears satisfied.
- The histogram (lower left) is approximately flat (rectangular) across [0, 1] rather than bell-shaped — this is the clearest signal that the data are not normally distributed. The flat shape is the signature of a uniform distribution.
- The normal probability plot (lower right) shows a pronounced S-shaped curve — the classic indicator of a light-tailed (platykurtic) distribution compared to the normal.
Three of four assumptions hold; the distributional assumption does not. The data are random, from a fixed location, with fixed variation, but the underlying distribution is uniform rather than normal.
Run Sequence Plot
The run sequence plot shows data fluctuating randomly in a stable band between 0 and 1. There is no trend, shift, or systematic pattern. Both location and variation remain constant across the run.
Lag Plot
The lag plot at lag 1 displays a structureless square cloud, consistent with independent, identically distributed observations. The square shape (rather than circular) is characteristic of uniform data with bounded support.
Histogram with Normal PDF Overlay
The histogram with an overlaid normal probability density function shows that the flat, rectangular shape of the data is a poor fit to the bell-shaped normal curve. The frequencies are relatively flat across the range of the data while the normal PDF peaks sharply in the center. This mismatch is the clearest graphical signal that the normal distribution is not appropriate for these data.
Normal Probability Plot
The normal probability plot shows a pronounced S-shaped curve rather than the expected straight line. The tails curve away from the theoretical line in opposite directions — the lower tail curves upward and the upper tail curves downward. This S-shape is the classic indicator of a light-tailed (platykurtic) distribution compared to the normal.
Histogram with Uniform PDF Overlay
The histogram with a uniform PDF overlay provides a much better fit. The flat, rectangular histogram shape matches the expected constant frequency under the U(0,1) distribution, in stark contrast to the poor normal fit above.
Uniform Probability Plot
The uniform probability plot provides the definitive distributional test for these data. When the data truly follow a uniform distribution, the plot should be nearly linear.
The nearly linear pattern confirms that the uniform distribution is an excellent model for these data, contrasting sharply with the S-shaped curvature in the normal probability plot above.
Bootstrap Plot
The bootstrap plot compares the sampling variability of three location estimators — the mean, median, and midrange — by drawing 100 subsamples with replacement from the data and computing each statistic for every subsample. The top row shows the computed statistic versus subsample number; the bottom row shows the corresponding histograms.
The midrange has a far smaller variance than either the mean or the median, making it the superior location estimator for uniform data. The bootstrap confidence interval for the midrange is substantially tighter than for the mean, demonstrating why the standard mean-based confidence interval overestimates uncertainty for this distribution.
Autocorrelation Plot
The autocorrelation plot confirms the randomness assumption. All autocorrelation coefficients at lags 1 through 50 fall within the 95% confidence bands at ±2/√N, consistent with independent observations.
Spectral Plot
The spectral plot shows a flat power spectrum with no dominant peaks, consistent with white noise. This confirms that the data have no periodic structure or frequency-domain dependence.
Quantitative Output and Interpretation
Summary Statistics
| Statistic | Value |
|---|---|
| Sample size | 500 |
| Mean | 0.5078 |
| Std Dev | 0.2943 |
| Median | 0.5184 |
| Min | 0.0025 |
| Max | 0.9971 |
| Range | 0.9946 |
The sample mean of 0.5078 is close to the theoretical value of 0.5, and the sample standard deviation of 0.2943 approximates the theoretical value of .
Location Test
The location test fits a linear regression of the response against the run-order index and tests whether the slope is significantly different from zero.
| Parameter | Estimate | Std Error | t-Value |
|---|---|---|---|
| (intercept) | 0.522923 | 0.02638 | 19.82 |
| (slope) | −0.66 |
Residual standard deviation: 0.2945 with 498 degrees of freedom.
Conclusion: The slope t-value of −0.66 does not exceed the critical value , so we fail to reject — the slope is not significantly different from zero at the 5% significance level. The process mean is stable at approximately 0.508. The fixed-location assumption is satisfied.
Variation Test
The Levene test is used instead of Bartlett’s test because the data are non-normal (uniform). The Levene test (median-based variant) divides the data into equal-length intervals and tests whether their variances are homogeneous.
| Statistic | Value |
|---|---|
| Test statistic | 0.07983 |
| Degrees of freedom | and |
| Critical value | 2.623 |
Conclusion: The test statistic of 0.07983 is well below the critical value of 2.623, so we fail to reject — the variances are not significantly different across the four intervals. The fixed-variation assumption is satisfied.
Randomness Tests
Two complementary tests assess whether the observations are independent.
Runs test — tests whether the sequence of values above and below the median was produced randomly.
| Statistic | Value |
|---|---|
| Test statistic | 0.2686 |
| Critical value | 1.96 |
Conclusion: does not exceed 1.96, so we fail to reject — the sequence is consistent with randomness.
Lag-1 autocorrelation — measures the linear dependence between consecutive observations.
| Statistic | Value |
|---|---|
| 0.03 | |
| Critical value | 0.087 |
Conclusion: The lag-1 autocorrelation of 0.03 is well within the critical bounds of , and values at all lags through lag 50 fall within the significance bounds. The data are independent. The randomness assumption is satisfied.
Distribution Test
The Anderson-Darling test and PPCC test are applied to test whether the data follow a normal distribution.
Normal distribution tests:
| Test | Statistic | Critical Value | Result |
|---|---|---|---|
| Anderson-Darling | 5.765 | 0.787 | Reject normality |
| Normal PPCC | 0.9772 | 0.987 | Reject normality |
The Anderson-Darling statistic of 5.765 far exceeds the 5% critical value of 0.787, and the normal PPCC of 0.9772 is less than its critical value of 0.987. Both tests decisively reject the null hypothesis of normality.
Uniform distribution tests:
| Test | Statistic | Result |
|---|---|---|
| Uniform PPCC | 0.9996 | Excellent fit |
| Uniform Anderson-Darling | 0.9082 | Good fit |
The uniform PPCC of 0.9996 indicates an excellent fit to the uniform distribution. The skewness and kurtosis values are approximately 0.00 and approximately −1.20, respectively — close to the theoretical uniform values of 0.0 and −1.2. The negative excess kurtosis confirms a platykurtic (flat-topped) distribution relative to the normal.
Conclusion: The data are not normally distributed but are an excellent fit to the uniform distribution.
Outlier Detection
Grubbs’ test does not identify outliers, which is expected since the uniform distribution has finite support and the observations are well-contained within [0, 1].
Test Summary
| Assumption | Test | Statistic | Critical Value | Result |
|---|---|---|---|---|
| Fixed location | Regression on run order | 1.96 | Fail to reject | |
| Fixed variation | Levene test | 2.623 | Fail to reject | |
| Randomness | Runs test | 1.96 | Fail to reject | |
| Randomness | Autocorrelation lag-1 | 0.087 | Fail to reject | |
| Distribution | Anderson-Darling (normal) | 0.787 | Reject | |
| Distribution | PPCC (normal) | 0.9772 | 0.987 | Reject |
| Outliers | Grubbs’ test | — | — | No outliers |
Three of four assumptions hold. The distributional assumption fails — the data follow a uniform, not normal, distribution. The univariate model is appropriate, but the error term follows a uniform rather than normal distribution.
Interpretation
The graphical and quantitative analyses converge on a clear finding: the uniform random numbers dataset satisfies three of the four underlying assumptions of a univariate measurement process, with the distributional assumption being the sole departure. The run sequence plot shows a stable band between 0 and 1 with no trend or shift, confirmed by the location test (, well within the critical bounds of ). The Levene test confirms constant variation (, far below the critical value of ). Independence is supported by the structureless lag plot, the runs test (, non-significant), and the lag-1 autocorrelation (, well within ). The autocorrelation plot confirms that all coefficients through lag 50 fall within the 95% confidence bands, and the spectral plot shows a flat spectrum consistent with white noise.
The distinctive feature of this case study is the distributional finding. The Anderson-Darling test () far exceeds the 5% critical value of , and the normal PPCC of falls well below the critical value of — both decisively reject normality. Graphically, the histogram displays a flat, rectangular shape rather than a bell curve, and the normal probability plot shows the characteristic S-shaped curvature of a platykurtic distribution. In contrast, the uniform probability plot is nearly linear, and the uniform PPCC of indicates an excellent fit to the uniform distribution. The skewness and kurtosis values (approximately and ) match the theoretical uniform moments, confirming that the departure from normality is not due to outliers or contamination but to the fundamental shape of the distribution.
Because the distributional assumption is violated, the standard confidence interval formula is not strictly valid, although at the central limit theorem provides reasonable coverage. The sample mean of is a consistent estimator, but for uniform data the mid-range is the optimal location estimator with a faster convergence rate. The standard normal-based interval of overestimates uncertainty compared to a distribution-specific approach — a bootstrap confidence interval of approximately is substantially tighter. When the normality assumption is known to be violated, a distribution-free or bootstrap interval should be preferred.
Conclusions
Three of four assumptions hold: fixed location, fixed variation, and randomness are confirmed. The distributional assumption fails — the data follow a uniform, not normal, distribution.
This has practical implications for inference. The sample mean remains a valid location estimate, but confidence intervals based on the normal distribution will be incorrect. The standard confidence interval:
is approximately valid by the central limit theorem (with the CLT provides reasonable coverage), yielding a 95% interval of (0.482, 0.534). However, for uniform data the mid-range is the optimal location estimator, with a tighter 95% bootstrap confidence interval of approximately (0.497, 0.503).
A distribution-free confidence interval or bootstrap approach should be preferred when the normality assumption is known to be violated. This case study demonstrates that EDA techniques — particularly the histogram, normal probability plot, Anderson-Darling test, and PPCC — reliably detect non-normal distributions, even when all other assumptions are satisfied.