EDA Analysis Questions
NIST/SEMATECH Section 1.3.2 Analysis Questions
Every exploratory data analysis addresses a core set of questions. These questions drive the selection of graphical and quantitative techniques and guide the analyst toward a valid statistical model.
The Seven Standard EDA Questions
| # | Question | Key Techniques |
|---|---|---|
| 1 | What is a typical value (location)? | Histogram, Box Plot, Measures of Location |
| 2 | How spread out is the data? | Measures of Scale, Box Plot, Histogram |
| 3 | What is a good distributional model? | Probability Plot, PPCC Plot, Anderson-Darling Test |
| 4 | Are there outliers? | Box Plot, Grubbs’ Test, Normal Probability Plot |
| 5 | Is the data random? | Run-Sequence Plot, Lag Plot, Runs Test |
| 6 | Does the assumed model fit? | Chi-Square GOF, K-S Test, Probability Plot |
| 7 | Is the process in control? | Mean Plot, Std Deviation Plot, ANOVA |
Location Estimates
Estimating a “typical value” is the most fundamental EDA question. Common estimators include the mean (sensitive to outliers), median (robust), and mode (distributional peak). See Measures of Location for formulas and Confidence Limits for interval estimation.
Spread Estimates
Quantifying variability uses the standard deviation, range, and interquartile range (IQR). Robust spread measures resist outlier influence. Compare variances across groups with Bartlett’s Test or Levene Test.
Percentile Estimation
Percentiles partition ordered data into hundredths. The Box Plot visualizes Q1, median, and Q3. For distributional percentiles, use the fitted CDF — see Probability Distribution Tables.
Randomness Assessment
Non-random patterns (trend, oscillation, clustering) violate assumptions of most statistical models. The 4-Plot provides a rapid visual check combining run sequence, lag, histogram, and normal probability plots. Quantitatively, Autocorrelation and Runs Test detect departures from randomness.
Model Validation
After selecting a candidate distribution from the distribution families, validate the fit using both graphical evidence (Probability Plot linearity) and formal tests (Anderson-Darling, K-S Test). The 6-Plot combines multiple diagnostics in a single display.
Recommended Workflow
- Screen — 4-Plot or 6-Plot for initial overview
- Identify — Histogram + Box Plot for shape and outliers
- Test — Probability Plot + PPCC Plot for distribution selection
- Validate — Formal goodness-of-fit test + residual analysis
- Estimate — Confidence Limits for key parameters