One-Factor ANOVA

NIST/SEMATECH Section 1.3.5.4 One-Factor ANOVA

What It Is

One-factor analysis of variance (ANOVA) is a hypothesis test that determines whether three or more group means differ significantly. It decomposes the total variability into between-group and within-group components, comparing them via an F-statistic.

When to Use It

Use one-factor ANOVA when comparing the means of three or more independent groups defined by a single categorical factor. It extends the two-sample t-test to multiple groups while controlling the family-wise error rate. ANOVA is fundamental in designed experiments, quality improvement studies, and any setting where the effect of a factor with multiple levels must be assessed.

How to Interpret

If the F-statistic exceeds the critical value from the F-distribution with (k-1, N-k) degrees of freedom, reject the null hypothesis that all group means are equal. A significant result indicates that at least one group mean differs, but does not identify which pairs differ -- follow-up multiple comparison tests (e.g., Tukey HSD) are needed for that. The ratio SS_B / SS_T gives the proportion of total variability explained by the factor, analogous to R-squared. When F is close to 1, the between-group variation is comparable to within-group variation, suggesting no factor effect.

Assumptions and Limitations

ANOVA assumes that observations are independent, each group is drawn from a normally distributed population, and all groups have equal variances (homoscedasticity). Violations of normality are less serious for large, balanced designs, but unequal variances can inflate the Type I error rate. Use Levene or Bartlett tests to check the equal-variance assumption.

Reference: NIST/SEMATECH e-Handbook, Section 1.3.5.4

Formulas

F-Statistic

F = \frac{\text{MS}_{\text{between}}}{\text{MS}_{\text{within}}} = \frac{SS_B / (k-1)}{SS_W / (N-k)}

The ratio of between-group mean square to within-group mean square. A large F indicates that group means differ more than expected by chance.

Between-Group Sum of Squares

SS_B = \sum_{j=1}^{k} n_j (\bar{x}_j - \bar{x})^2

Measures the variability of the group means around the grand mean, weighted by group sizes.

Within-Group Sum of Squares

SS_W = \sum_{j=1}^{k} \sum_{i=1}^{n_j} (x_{ij} - \bar{x}_j)^2

Measures the variability of individual observations around their respective group means.

Total Sum of Squares

SS_T = SS_B + SS_W = \sum_{j=1}^{k}\sum_{i=1}^{n_j}(x_{ij} - \bar{x})^2

The total variability in the data, partitioned into between-group and within-group components.

Python Example

import numpy as np
from scipy import stats

# Three groups of measurements
group1 = np.array([23.1, 24.3, 22.8, 23.9, 24.0])
group2 = np.array([26.4, 25.9, 27.1, 26.3, 26.8])
group3 = np.array([28.2, 27.5, 29.1, 28.0, 28.7])

# One-factor ANOVA
f_stat, p_value = stats.f_oneway(group1, group2, group3)

print(f"F-statistic: {f_stat:.4f}")
print(f"p-value:     {p_value:.6f}")
print(f"Reject H0 at alpha=0.05: {p_value < 0.05}")

# Effect size (eta-squared)
grand_mean = np.mean(np.concatenate([group1, group2, group3]))
ss_between = sum(len(g) * (np.mean(g) - grand_mean)**2
                 for g in [group1, group2, group3])
ss_total = sum(np.sum((g - grand_mean)**2)
               for g in [group1, group2, group3])
eta_sq = ss_between / ss_total
print(f"Eta-squared: {eta_sq:.4f}")