Quantitative Techniques

NIST/SEMATECH Section 1.3.5

Classical confirmatory methods that complement the graphical EDA techniques. These 18 methods cover interval estimation, hypothesis testing, measures of location and scale, distributional fit, outlier detection, and designed experiment analysis.

Confirmatory Statistics

The techniques in this section are classical statistical methods as opposed to EDA techniques. EDA and classical techniques are not mutually exclusive and can be used in a complementary fashion. For example, the analysis can start with graphical techniques such as the 4-plot followed by the classical confirmatory methods discussed here to provide more rigorous statements about the conclusions. If the classical methods yield different conclusions than the graphical analysis, some effort should be invested to explain why — often this indicates that assumptions of the classical techniques are violated.

Many of the quantitative techniques fall into two broad categories:

Interval estimation
Hypothesis tests

Interval Estimates

It is common in statistics to estimate a parameter from a sample of data. The value of the parameter using all possible data, not just the sample data, is called the population parameter or true value of the parameter. An estimate of the true parameter value made using sample data is called a point estimate or sample estimate.

For example, the most commonly used measure of location is the mean. The population mean is the sum of all members of the population divided by the number of members. As it is typically impractical to measure every member, a random sample is drawn and the sample mean is used as a point estimate of the population mean.

Interval estimates expand on point estimates by incorporating the uncertainty of the point estimate. Different samples from the same population generate different values for the sample mean. An interval estimate quantifies this uncertainty by computing lower and upper values of an interval which will, with a given level of confidence (i.e., probability), contain the population parameter.

Hypothesis Tests

Hypothesis tests also address the uncertainty of the sample estimate. However, instead of providing an interval, a hypothesis test attempts to refute a specific claim about a population parameter based on the sample data. For example, the hypothesis might be that two population means are equal, or that a population standard deviation equals a target value.

To reject a hypothesis is to conclude that it is false. However, to accept a hypothesis does not mean it is true — only that we do not have sufficient evidence to believe otherwise. Thus hypothesis tests are stated in terms of both a condition that is doubted (null hypothesis, H₀) and a condition that is believed (alternative hypothesis, H_a).

A common format for a hypothesis test:

H₀:: A statement of the null hypothesis, e.g., two population means are equal.
H_a:: A statement of the alternative hypothesis, e.g., two population means are not equal.
Test Statistic:: Based on the specific hypothesis test being performed.
Significance Level:: The significance level, α, defines the sensitivity of the test. A value of α = 0.05 means that we inadvertently reject the null hypothesis 5% of the time when it is in fact true (Type I error). Values of 0.1, 0.05, and 0.01 are commonly used. The probability of rejecting H₀ when it is false is called the power of the test (1 − β). Its complement β is the Type II error — accepting H₀ when H_a is true.
Critical Region:: The values of the test statistic that lead to rejection of H₀. Based on the distribution of the test statistic and the significance level, a cut-off value is computed. Values above, below, or both sides of this cut-off (depending on the direction of the test) define the critical region.

Practical Versus Statistical Significance

It is important to distinguish between statistical significance and practical significance. Statistical significance simply means that we reject the null hypothesis. The ability of the test to detect differences depends on the sample size. For a particularly large sample, the test may reject the null hypothesis that two process means are equivalent, yet the actual difference may be too small to have real engineering significance. Similarly, if the sample is small, a difference that is large in engineering terms may not lead to rejection of H₀. The analyst should combine engineering judgement with statistical analysis.

Bootstrap Uncertainty Estimates

In some cases, it is possible to mathematically derive appropriate uncertainty intervals — particularly for intervals based on the assumption of a normal distribution. However, there are many cases where this is not possible. In these cases, the bootstrap provides a method for empirically determining an appropriate interval.

All 18 Quantitative Techniques

Location

Measures of Location

Section 1.3.5.1

Measures of location summarize the central tendency of a dataset using statistics such as the mean, median, and mode. They are used to characterize where the center of a distribution lies.

Confidence Limits for the Mean

Section 1.3.5.2

Confidence limits define an interval that contains the true population mean with a specified level of confidence. They are used to quantify the uncertainty in a sample mean estimate.

Two-Sample t-Test

Section 1.3.5.3

The two-sample t-test determines whether the means of two independent groups differ significantly. It is used to compare location parameters when the data are approximately normally distributed.

One-Factor ANOVA

Section 1.3.5.4

One-factor analysis of variance tests whether the means of three or more groups differ significantly. It is used when comparing location parameters across multiple levels of a single factor.

Multi-Factor ANOVA

Section 1.3.5.5

Multi-factor analysis of variance tests for main effects and interactions among two or more factors simultaneously. It is used in designed experiments to identify which factors and factor combinations significantly affect the response.

Scale

Measures of Scale

Section 1.3.5.6

Measures of scale quantify the spread or variability of a dataset using statistics such as the standard deviation, variance, and range. They are used to characterize how dispersed the data are around the center.

Bartlett's Test

Section 1.3.5.7

Bartlett's test assesses whether several groups have equal variances, assuming the data are normally distributed. It is used to verify the homogeneity of variance assumption before applying ANOVA or t-tests.

Chi-Square Test for Standard Deviation

Section 1.3.5.8

The chi-square test for the standard deviation tests whether a population standard deviation equals a specified value. It is used to assess whether the variability of a process meets a target specification.

F-Test for Equality of Two Variances

Section 1.3.5.9

The F-test compares the variances of two independent groups to determine if they are significantly different. It is used to check the equal variance assumption before performing a two-sample t-test.

Levene Test for Equality of Variances

Section 1.3.5.10

The Levene test assesses whether multiple groups have equal variances without requiring normality. It is used as a robust alternative to Bartlett's test when the data may not be normally distributed.

Skewness and Kurtosis

Measures of Skewness and Kurtosis

Section 1.3.5.11

Skewness measures the asymmetry of a distribution, while kurtosis measures the heaviness of its tails relative to a normal distribution. They are used to characterize the shape of a dataset beyond location and scale.

Randomness

Autocorrelation

Section 1.3.5.12

The autocorrelation coefficient quantifies the linear dependence between observations at different time lags. It is used to test whether successive measurements are statistically independent or exhibit serial correlation.

Runs Test for Randomness

Section 1.3.5.13

The runs test determines whether the order of observations above and below the median is random. It is a non-parametric test used to detect trends, oscillations, or other departures from randomness.

Distributional Measures

Anderson-Darling Test

Section 1.3.5.14

The Anderson-Darling test assesses whether a dataset follows a specified probability distribution, with particular sensitivity in the tails. It is used as a formal goodness-of-fit test complementing graphical methods.

Chi-Square Goodness-of-Fit Test

Section 1.3.5.15

The chi-square goodness-of-fit test determines whether observed frequency counts match expected counts under a hypothesized distribution. It is used for both continuous and discrete distribution testing with binned data.

Kolmogorov-Smirnov Goodness-of-Fit Test

Section 1.3.5.16

The Kolmogorov-Smirnov test compares the empirical cumulative distribution function with a theoretical one or with another sample. It is used as a distribution-free goodness-of-fit test based on the maximum distance between CDFs.

Outliers

Grubbs' Test for Outliers

Section 1.3.5.17.1

Grubbs' test detects a single outlier in a univariate dataset assumed to come from a normally distributed population. It is used to formally test whether the most extreme value in a sample is statistically aberrant.

2-Level Factorial Designs

Yates Analysis for Designed Experiments

Section 1.3.5.18

Yates analysis is an efficient algorithm for computing main effects and interactions in two-level full factorial experiments. It is used to systematically estimate all factor effects from a 2^k factorial design.