Measures of Scale

NIST/SEMATECH Section 1.3.5.6 Measures of Scale

What It Is

Measures of scale quantify the spread or variability of a dataset, indicating how tightly or loosely the observations cluster around the center. The most common measures are the variance, standard deviation, range, average absolute deviation (AAD), median absolute deviation (MAD), and interquartile range (IQR).

When to Use It

Use measures of scale to characterize data dispersion and assess process variability. They are essential companions to measures of location: two datasets can have identical means but very different spreads. In quality control, the standard deviation directly determines process capability indices and control chart limits. The IQR is preferred when outliers are present or the distribution is skewed.

How to Interpret

A small standard deviation relative to the mean indicates that observations are clustered tightly around the center, while a large value suggests wide dispersion. The coefficient of variation (CV = s / x-bar) allows comparison of variability across datasets with different units or scales. The range is the simplest but least robust measure, as a single outlier can inflate it dramatically. The IQR is preferred for skewed data or when outliers are present, as it depends only on the central portion of the distribution. When reporting variability, always pair the measure of scale with the corresponding measure of location.

Assumptions and Limitations

The sample variance is an unbiased estimator of the population variance when observations are independent and identically distributed. The standard deviation is slightly biased for small samples from normal populations but consistent. For non-normal or heavy-tailed distributions, robust alternatives like the median absolute deviation (MAD) or interquartile range are more appropriate because they provide robustness of validity -- confidence intervals maintain correct coverage regardless of the underlying distribution.

Reference: NIST/SEMATECH e-Handbook, Section 1.3.5.6

Formulas

Sample Variance

s^2 = \frac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})^2

The average squared deviation from the sample mean, using n-1 in the denominator for an unbiased estimate of the population variance.

Sample Standard Deviation

s = \sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})^2}

The square root of the variance, expressed in the same units as the original data for easier interpretation.

Average Absolute Deviation

\text{AAD} = \frac{1}{n}\sum_{i=1}^{n}|x_i - \bar{x}|

The mean of the absolute deviations from the sample mean. Less affected by extreme observations than the variance because it does not square the distances.

Median Absolute Deviation

\text{MAD} = \text{median}(|x_i - \tilde{x}|)

The median of the absolute deviations from the sample median. Highly robust to outliers because both the center and the spread measure use the median.

Range

R = x_{(n)} - x_{(1)}

The difference between the largest and smallest observations. Simple to compute but sensitive to outliers.

Interquartile Range

\text{IQR} = Q_3 - Q_1

The difference between the 75th and 25th percentiles. It captures the spread of the central 50% of the data and is robust to outliers.