One-Factor ANOVA
NIST/SEMATECH Section 1.3.5.4 One-Factor ANOVA
What It Is
One-factor analysis of variance (ANOVA) is a hypothesis test that determines whether three or more group means differ significantly. It decomposes the total variability into between-group and within-group components, comparing them via an F-statistic.
When to Use It
Use one-factor ANOVA when comparing the means of three or more independent groups defined by a single categorical factor. It extends the two-sample t-test to multiple groups while controlling the family-wise error rate. ANOVA is fundamental in designed experiments, quality improvement studies, and any setting where the effect of a factor with multiple levels must be assessed.
How to Interpret
If the F-statistic exceeds the critical value from the F-distribution with (k-1, N-k) degrees of freedom, reject the null hypothesis that all group means are equal. A significant result indicates that at least one group mean differs, but does not identify which pairs differ -- follow-up multiple comparison tests (e.g., Tukey HSD) are needed for that. The ratio SS_B / SS_T gives the proportion of total variability explained by the factor, analogous to R-squared. When F is close to 1, the between-group variation is comparable to within-group variation, suggesting no factor effect.
Assumptions and Limitations
ANOVA assumes that observations are independent, each group is drawn from a normally distributed population, and all groups have equal variances (homoscedasticity). Violations of normality are less serious for large, balanced designs, but unequal variances can inflate the Type I error rate. Use Levene or Bartlett tests to check the equal-variance assumption.
Reference: NIST/SEMATECH e-Handbook, Section 1.3.5.4
Formulas
F-Statistic
The ratio of between-group mean square to within-group mean square. A large F indicates that group means differ more than expected by chance.
Between-Group Sum of Squares
Measures the variability of the group means around the grand mean, weighted by group sizes.
Within-Group Sum of Squares
Measures the variability of individual observations around their respective group means.
Total Sum of Squares
The total variability in the data, partitioned into between-group and within-group components.
Python Example
import numpy as npfrom scipy import stats
# Three groups of measurementsgroup1 = np.array([23.1, 24.3, 22.8, 23.9, 24.0])group2 = np.array([26.4, 25.9, 27.1, 26.3, 26.8])group3 = np.array([28.2, 27.5, 29.1, 28.0, 28.7])
# One-factor ANOVAf_stat, p_value = stats.f_oneway(group1, group2, group3)
print(f"F-statistic: {f_stat:.4f}")print(f"p-value: {p_value:.6f}")print(f"Reject H0 at alpha=0.05: {p_value < 0.05}")
# Effect size (eta-squared)grand_mean = np.mean(np.concatenate([group1, group2, group3]))ss_between = sum(len(g) * (np.mean(g) - grand_mean)**2 for g in [group1, group2, group3])ss_total = sum(np.sum((g - grand_mean)**2) for g in [group1, group2, group3])eta_sq = ss_between / ss_totalprint(f"Eta-squared: {eta_sq:.4f}")