Skip to main content

EDA Analysis Questions

NIST/SEMATECH Section 1.3.2 Analysis Questions

Every exploratory data analysis addresses a core set of questions. These questions drive the selection of graphical and quantitative techniques and guide the analyst toward a valid statistical model.

The Seven Standard EDA Questions

#QuestionKey Techniques
1What is a typical value (location)?Histogram, Box Plot, Measures of Location
2How spread out is the data?Measures of Scale, Box Plot, Histogram
3What is a good distributional model?Probability Plot, PPCC Plot, Anderson-Darling Test
4Are there outliers?Box Plot, Grubbs’ Test, Normal Probability Plot
5Is the data random?Run-Sequence Plot, Lag Plot, Runs Test
6Does the assumed model fit?Chi-Square GOF, K-S Test, Probability Plot
7Is the process in control?Mean Plot, Std Deviation Plot, ANOVA

Location Estimates

Estimating a “typical value” is the most fundamental EDA question. Common estimators include the mean (sensitive to outliers), median (robust), and mode (distributional peak). See Measures of Location for formulas and Confidence Limits for interval estimation.

Spread Estimates

Quantifying variability uses the standard deviation, range, and interquartile range (IQR). Robust spread measures resist outlier influence. Compare variances across groups with Bartlett’s Test or Levene Test.

Percentile Estimation

Percentiles partition ordered data into hundredths. The Box Plot visualizes Q1, median, and Q3. For distributional percentiles, use the fitted CDF — see Probability Distribution Tables.

Randomness Assessment

Non-random patterns (trend, oscillation, clustering) violate assumptions of most statistical models. The 4-Plot provides a rapid visual check combining run sequence, lag, histogram, and normal probability plots. Quantitatively, Autocorrelation and Runs Test detect departures from randomness.

Model Validation

After selecting a candidate distribution from the distribution families, validate the fit using both graphical evidence (Probability Plot linearity) and formal tests (Anderson-Darling, K-S Test). The 6-Plot combines multiple diagnostics in a single display.

  1. Screen4-Plot or 6-Plot for initial overview
  2. IdentifyHistogram + Box Plot for shape and outliers
  3. TestProbability Plot + PPCC Plot for distribution selection
  4. Validate — Formal goodness-of-fit test + residual analysis
  5. EstimateConfidence Limits for key parameters