EDA Analysis Questions

NIST/SEMATECH Section 1.3.2 Analysis Questions

Every exploratory data analysis addresses a core set of questions. These questions drive the selection of graphical and quantitative techniques and guide the analyst toward a valid statistical model.

The Seven Standard EDA Questions

#	Question	Key Techniques
1	What is a typical value (location)?	Histogram, Box Plot, Measures of Location
2	How spread out is the data?	Measures of Scale, Box Plot, Histogram
3	What is a good distributional model?	Probability Plot, PPCC Plot, Anderson-Darling Test
4	Are there outliers?	Box Plot, Grubbs’ Test, Normal Probability Plot
5	Is the data random?	Run-Sequence Plot, Lag Plot, Runs Test
6	Does the assumed model fit?	Chi-Square GOF, K-S Test, Probability Plot
7	Is the process in control?	Mean Plot, Std Deviation Plot, ANOVA

Location Estimates

Estimating a “typical value” is the most fundamental EDA question. Common estimators include the mean (sensitive to outliers), median (robust), and mode (distributional peak). See Measures of Location for formulas and Confidence Limits for interval estimation.

Spread Estimates

Quantifying variability uses the standard deviation, range, and interquartile range (IQR). Robust spread measures resist outlier influence. Compare variances across groups with Bartlett’s Test or Levene Test.

Percentile Estimation

Percentiles partition ordered data into hundredths. The Box Plot visualizes Q1, median, and Q3. For distributional percentiles, use the fitted CDF — see Probability Distribution Tables.

Randomness Assessment

Non-random patterns (trend, oscillation, clustering) violate assumptions of most statistical models. The 4-Plot provides a rapid visual check combining run sequence, lag, histogram, and normal probability plots. Quantitatively, Autocorrelation and Runs Test detect departures from randomness.

Model Validation

After selecting a candidate distribution from the distribution families, validate the fit using both graphical evidence (Probability Plot linearity) and formal tests (Anderson-Darling, K-S Test). The 6-Plot combines multiple diagnostics in a single display.

Recommended Workflow

Screen — 4-Plot or 6-Plot for initial overview
Identify — Histogram + Box Plot for shape and outliers
Test — Probability Plot + PPCC Plot for distribution selection
Validate — Formal goodness-of-fit test + residual analysis
Estimate — Confidence Limits for key parameters