One-sample proportion tests

1. One-sample proportion tests

The previous example tested a single proportion against a specific value. As with means, we can also test differences between proportions in two populations.

2. Chapter 1 recap

The hypothesis tests in Chapter 1 measured whether or not an unknown population proportion was equal to some value. You used a bootstrap distribution of the sample to estimate the standard error of the sample statistic. The standard error was used to calculate a standardized test statistic, which was used to get a p-value, which was used to decide whether or not to reject the null hypothesis. Bootstrap distributions can be computationally intensive to calculate, so this time we'll calculate the test statistic without it.

3. Standardized test statistic for proportions

An unknown population parameter that is a proportion, or population proportion for short, is denoted p. The sample proportion is denoted p-hat, and the hypothesized value for the population proportion is denoted p-zero. As in Chapter 1, the standardized test statistic is a z-score. You calculate it by starting with the sample statistic, subtracting its mean, then dividing by its standard error. p-hat minus the mean of p-hat, divided by the standard error of p-hat. Recall from Sampling in R, the mean of p-hat is p. Under the null hypothesis, the unknown proportion p is assumed to be the hypothesized population parameter p-zero. The z-score is now p-hat minus p-zero, divided by the standard error of p-hat.

4. Easier standard error calculations

Here's the approximate standard error for two sample means from Chapter 2. For proportions, under H-naught, the standard error of p-hat is p-zero times one minus p-zero, divided by the number of observations, then square-rooted. We can substitute this into our equation for the z-score. This is easy to calculate because it only uses p-hat and n, which we get from the sample, and p-zero, which we chose.

5. Why z instead of t?

You might wonder why we used a z-distribution here, but a t-distribution in Chapter 2. This is the test statistic equation for the two sample mean case. The standard deviation of the sample, s, is calculated from the sample mean, x-bar. That means that x-bar is used in the numerator to estimate the population mean, and in the denominator to estimate the population standard deviation. This dual usage causes an increase in our uncertainty about the estimate of the population parameter. Since t-distributions are effectively a normal distribution with fatter tails, we can use them to account for this extra uncertainty. In effect, the t-distribution provides extra caution against mistakenly rejecting the null hypothesis. For proportions, we only use p-hat in the numerator, thus avoiding the problem with uncertainty, and a z-distribution is fine.

6. Stack Overflow age categories

Returning to the Stack Overflow survey, let's hypothesize that half the users are under thirty. I'll set a significance level of point-zero-one. Just over half the users are under thirty.

7. Variables for z

Let's get the numbers needed for the z-score. p-hat is the proportion of rows in the sample where age_cat equals under thirty. p-zero is point-five according to the null hypothesis. n is the number of rows in the dataset.

8. Calculating the z-score

Calculating the z-score is just arithmetic; the value is three-point-five.

9. Calculating the p-value

For left-tailed alternative hypotheses, you transform the z-score into a p-value using the pnorm with the default of lower-dot-tail equals TRUE. For right-tailed alternative hypotheses you set lower-dot-tail to FALSE. For two-tailed alternative hypotheses, you check whether the test statistic lies in either tail, so the p-value is the sum of these two values. Since the normal distribution PDF is symmetric, here this simplifies to twice the left-tailed p-value. Here, the p-value is less than the significance level of point-zero-one, so we reject the null hypothesis and conclude that the proportion of users under thirty is not equal to point-five.

10. Let's practice!

Let's try an example.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.