Sampling and the Standard Error of the Mean



Note: This control assumes that you are using Microsoft's Internet Explorer as your browser. You can download it for free from http://www.microsoft.com/ie/download/windows.htm, and that you are using Windows 95, 98 or NT. When asked if you want to install the sampling control, click on Yes.

When we draw a sample from a population, and calculate a sample statistic such as the mean, we could ask how well does the sample statistic (called a point estimate) represent the same value for the population? That is, if we calculate the mean of a sample, how close will it be to the mean of the population? Of course, the answer will change depending on the particular sample that we draw. But could we develop a measure that would at least give us an indication of how well we expect the sample mean to represent the population mean?

We could subtract the sample mean from the population mean to get an idea of how close the sample mean is to the population mean. (Technically, we don't know the value of the population mean -- if we knew the population mean, then there would be no sense in calculating the sample mean. But in theory, it is possible to get an arbitrarily good estimate of the population mean and we can use that estimate as the population mean.) That is, we can calculate how much the sample mean deviates from the population mean. But is this particular sample representative of all of the samples that we could select? It may or may not be. So, we should draw another sample and determine how much it deviates from the population mean. In fact, we might want to do this many, many times. We could then calculate the mean of the deviates, to get an average measure of how much the sample means differ from the population mean.

The standard error of the mean does basically that. To determine the standard error of the mean, many samples are selected from the population. For each sample, the mean of that sample is calculated. The standard deviation of those means is then calculated. (Remember that the standard deviation is a measure of how much the data deviate from the mean on average.) The standard deviation of the sample means is defined as the standard error of the mean. It is a measure of how well the point estimate (e.g. the sample mean) represents the population parameter (e.g. the population mean.) If the standard error of the mean is close to zero, then the sample mean is likely to be a good estimate of the population mean. If the standard error of the mean is large, then the sample mean is likely to be a poor estimate of the population mean. (Note: Even with a large standard error of the mean, it is possible for the point estimate to be arbitrarily close to the population parameter. But the probability of that occurring decreases as the standard error of the mean increases.)

The following control allows you to investigate the standard error of the mean (the standard deviation of the sample means.) The control will draw samples of a specified size, with replacement, from a uniform distribution with a range from 1 to 100 (that is each of the numbers from 1 to 100 are equally likely to be drawn.) The mean of the sample is then displayed in the text box at the right. Another sample of the same size in then selected, and the mean of that sample is added to the text box. The process repeats until the specified number of samples has been selected. Next, the mean of the sample means, and the standard deviation of the sample means are displayed. The standard deviation of the sample means is equivalent to the standard error of the mean.

What can we do to make the sample mean a good estimator of the population mean? In general, as the size of the sample increases, the sample mean becomes a better and better estimator of the population mean. That is, each additional observation that is included in the sample increases the amount of information that we have about the population. That extra information will usually help us in estimating the mean of the population. Therefore, an increase in sample size implies that the sample means will be, on average, closer to the population mean. Thus, the standard error of the mean should decrease as the size of the sample increases. Try it with the control above. Set the sample size to a small number (e.g. 1) and generate the samples. Look at the standard deviation of the population means. Generate several more samples of the same sample size, observing the standard deviation of the population means after each generation. Increase the sample size, say to 10. Generate several sets of samples, watching the standard deviation of the population means after each generation. In general, did the standard deviation of the population means decrease with the larger sample size? Increase the sample size again, say to 100. Repeat the process. Did the standard deviation of the population means decrease with the larger sample size?

The standard error of the mean can be estimated by dividing the standard deviation of the population by the square root of the sample size:

Note that as the sample size (N) becomes large, the standard error of the mean becomes small, as discussed above. Notice, however, that once the sample size is reasonably large, further increases in the sample size have smaller effects on the size of the standard error of the mean. That is, the difference in the standard error of the mean for sample sizes of 1 and 10 is fairly large; the difference in the standard error of the mean for samples sizes 10 and 20 is much smaller.