Measures of Location

NIST/SEMATECH Section 1.3.5.1 Measures of Location

What It Is

Measures of location summarize the central tendency of a dataset, indicating where the "center" of the data lies. The most common measures are the mean, median, and mode, each offering a different perspective on the typical value in a distribution.

When to Use It

Use measures of location to characterize the center of a dataset before conducting further analysis. They are foundational descriptive statistics that provide a single representative value for a distribution. The choice between mean, median, and mode depends on the data distribution: the mean is preferred for symmetric data, while the median is more robust to outliers and skewed distributions.

How to Interpret

When the mean and median are approximately equal, the data distribution is roughly symmetric. If the mean is substantially larger than the median, the distribution is right-skewed, indicating the presence of high outliers pulling the mean upward. Conversely, a mean smaller than the median suggests left skew. The trimmed mean provides a useful compromise: it retains the efficiency of the mean for symmetric distributions while offering some protection against outlier influence. Comparing all three measures gives a quick diagnostic of distributional shape.

Assumptions and Limitations

The sample mean assumes the data come from a distribution with a finite first moment. It is sensitive to outliers, which can make it unrepresentative for heavily skewed or contaminated data. The median makes no distributional assumptions but is less statistically efficient than the mean for normally distributed data.

Reference: NIST/SEMATECH e-Handbook, Section 1.3.5.1

Formulas

Sample Mean

\bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_i

The arithmetic average of all n observations. It uses every data point and is the most commonly used measure of location.

Sample Median

\tilde{x} = \begin{cases} x_{(k+1)} & \text{if } n = 2k+1 \\ \frac{x_{(k)} + x_{(k+1)}}{2} & \text{if } n = 2k \end{cases}

The middle value of the ordered dataset. It divides the data into two equal halves and is resistant to extreme values.

Trimmed Mean

\bar{x}_t = \frac{1}{n - 2g}\sum_{i=g+1}^{n-g} x_{(i)}

A compromise between the mean and median, computed by averaging after removing the g smallest and g largest values.