- Z-Test - Simple Example
- Assumptions
- Z-Test Formulas
- Confidence Intervals for Z-Test
- Effect Size I - Cohen’s H
- Excel Tool for Z-Tests

## Definition & Introduction

A z-test for 2 independent proportions examines

if some event occurs equally often in 2 subpopulations.
Example: do equal percentages of male and female students answer some exam question correctly? The figure below sketches what the data required may look like.

## Z-Test - Simple Example

A simple random sample of n = 175 male and n = 164 female students completed 5 exam questions. The raw data -partly shown below- are in z-test-independent-proportions.xlsx.

Let's look into exam question 1 first. The raw data on this question can be summarized by the contingency table shown below.

Right, so our contingency table shows the percentages of male and female respondents who answered question 1 correctly. In statistics, however, we usually prefer proportions over percentages. Summarizing our findings, we see that

- a proportion of p1 = 0.720 out of n1 = 175 male students and
- a proportion of p2 = 0.768 out of n2 = 164 female students answered correctly.

In our sample, female students did slightly better than male students. However,
sample outcomes typically differ somewhat

from their population counterparts.
Even if the entire male and female populations perform similarly, we may still find a small sample difference. This could easily result from drawing random samples of students. The z-test attempts to nullify this hypothesis and thus demonstrate that the populations really *do* perform differently.

## Null Hypothesis

The null hypothesis for a z-test for independent proportions is that
the difference between 2 population proportions is zero.
If this is true, then the difference between the 2 sample proportions should be *close to* zero. Outcomes that are very different from zero are unlikely and thus argue against the null hypothesis. So exactly *how* unlikely is a given outcome? Computing this is fairly easy but it does require some assumptions.

## Assumptions

The assumptions for a z-test for independent proportions are

- independent observations and
- sufficient sample sizes.

So what are sufficient sample sizes? Most text books^{2,3,4,5} state that the test results are sufficiently accurate if

- \(p_1 \cdot n_1 \gt 5\),
- \((1-p_1) \cdot n_1 \gt 5\),
- \(p_2 \cdot n_2 \gt 5\),
- \((1-p_2) \cdot n_2 \gt 5\)

are *all* met. The Excel tool we'll present in a minute checks automatically if all 4 conditions are met.

## Z-Test Formulas

For computing our z-test, we first simply compute the difference between our sample proportions as

$$dif = p1 - p2$$

For our example data, this results in
$$dif = 0.720 - 0.768 = -0.048.$$

The standard error for this difference depends on the population proportion. We obviously don't know it but we can compute the *estimated* population proportion \(\hat{p}\) as
$$\hat{p} = \frac{p_1\cdot n_1 + p_2\cdot n_2}{n_1 + n_2}$$

For our example data, that'll be
$$\hat{p} = \frac{0.720\cdot 175 + 0.768\cdot 164}{175 + 164} = 0.743$$

Note that this is simply the overall proportion of our sample who answered correctly. This is readily verified in the contingency table we presented earlier.

Anyway. We can now estimate the standard error for the difference as

$$\hat{SE}_{dif} = \sqrt{\hat{p}\cdot (1-\hat{p})\cdot(\frac{1}{n_1} + \frac{1}{n_2})}$$

For our example, that'll be
$$\hat{SE}_{dif} = \sqrt{0.743\cdot (1-0.743)\cdot(\frac{1}{175} + \frac{1}{164})} = 0.047$$

We can now readily compute our test statistic \(Z\) as

$$Z = \frac{dif - \delta}{\hat{SE}_{dif}}$$

where \(\delta\) denotes the hypothesized population difference. Our null hypothesis states that \(\delta\) = 0 (both population proportions equal). So for our example,
$$Z = \frac{-0.048 - 0}{0.047} = -1.02$$

If the z-test assumptions are met, then \(Z\) approximately follows a standard normal distribution. From this we can readily look up that

$$P(Z\lt -1.02) = 0.155$$

so our 2-tailed significance is

$$P(2-tailed) = 0.309$$

Conclusion: we **don't reject the null hypothesis**. If the population difference is zero, then finding the observed sample difference or a more extreme one is pretty likely. Our data don't contradict the claim of male and female student populations performing equally on exam question 1.

## Confidence Intervals for Z-Test

Our data show that the difference between our sample proportions, \(dif\) = -0.048: the percentage of females who answered correctly is some 4.8% higher than that of males. Without any further information, this is our best guess for the population difference.

However, since our 4.8% is only based on a sample, it's likely to be somewhat “off”. So precisely *how much* do we expect it to be “off”? We can answer this by computing a confidence interval.

First off, the previous section showed that

$$Z = \frac{dif - \delta}{\hat{SE}_{dif}}$$

so the amount that our sample difference is likely to be off is
$$dif - \delta = Z \cdot \hat{SE}_{dif}$$

So which value should we use for \(Z\) here? Well, this depends on our confidence level, which is often chosen as 95%. As illustrated below, the 95% most likely z-values roughly lie between z = -1.96 and z = 1.96. The exact values can easily be looked up in Excel or Googlesheets as shown in Normal Distribution - Quick Tutorial.

Now, any confidence interval for \(\delta\) can be constructed as
$$dif + Z_{\frac{1}{2}\alpha}\cdot \hat{SE}mean \lt \delta \lt dif + Z_{1 - \frac{1}{2}\alpha}\cdot \hat{SE}mean$$

Therefore, the 95% CI for our example data is

$$-0.048 + (-1.96)\cdot 0.047 \lt \delta \lt -0.048 + 1.96 \cdot 0.047 = $$
$$-0.141 \lt \delta \lt 0.045$$

Note that this 95% confidence contains zero: a zero difference between the population proportions is within a likely range. That is, males and females performing equally on exam question 1 is not an unlikely hypothesis given the data at hand. And that's why we didn't reject this null hypothesis when testing at alpha = 0.05.

## Effect Size I - Cohen’s H

Our sample proportions are p1 = 0.72 and p2 = 0.77. Should we consider that a small, medium or large effect? A likely effect size measure is simply the difference between our proportions. However, a more suitable measure is **Cohen’s H**, defined as
$$h = |\;2\cdot arcsin\sqrt{p1} - 2\cdot arcsin\sqrt{p2}\;|$$

where \(arcsin\) refers to the arcsine function.

Basic **rules of thumb**^{7} are that

- h =
**0.2**indicates a**small**effect; - h =
**0.5**indicates a**medium**effect; - h =
**0.8**indicates a**large**effect.

For our example data, Cohen’s H is

$$h = |\;2\cdot arcsin\sqrt{0.72} - 2\cdot arcsin\sqrt{0.77}\;|$$

$$h = |\;2\cdot 1.01 - 2\cdot 1.07\;| = 0.11$$

Our rules of thumb suggest that this effect is close to negligible.

## Effect Size II - Phi Coefficient

An alternative effect size measure for the z-test for independent proportions is the phi coefficient, denoted by φ (the Greek letter “phi”). This is simply a Pearson correlation between dichotomous variables.

Following the **rules of thumb** for correlations^{7}, we could propose that

- \(|\;\phi\;| = 0.1\) indicates a small effect;
- \(|\;\phi\;| = 0.3\) indicates a medium effect;
- \(|\;\phi\;| = 0.5\) indicates a large effect.

However, we feel these rules of thumb are clearly disputable: they may be overly strict because | φ | tends to be considerably smaller than | r |. Anway. If anybody has a better idea, let me know.

## Excel Tool for Z-Tests

Z-tests are painfully absent from most statistical packages including SPSS and JASP. We therefore developed z-test-independent-proportions.xlsx, partly shown below.

Given 2 sample proportions and 2 sample sizes, our tool

- checks if the sample size assumption is met;
- computes the 2-tailed-significance-level for the z-test;
- computes (1 - β), the power for the z-test;
- computes a confidence interval for the difference between the proportions;
- computes Cohen’s H;
- computes φ.

We prefer this tool over online calculators because

- results in Excel can (and should) be saved with any other project files whereas results from online calculators usually aren't;
- all formulas used in Excel are visible and can thus be verified;
- running
*many*z-tests in Excel can be done effortlessly be expanding the formula section.

SPSS users can readily create the exact right input for the Excel tool with a MEANS command such as

***Create table with sample sizes and proportions for v1 to v5 by sex.**

means v1 to v5 by sex

/cells count mean.

Doing so for 2+ dependent variables results in a table as shown below.

Note that all dependent variables must follow a 0-1 coding in order for this to work.

## Relation Z-Test with Other Tests

An alternative for the z-test for independent proportions is a chi-square independence test. The significance level of the latter (which is always 1-tailed) is identical to the 2-tailed significance of the former. Upon closer inspection, these tests -as well as their assumptions- are statstically equivalent. However, there's 2 reasons for preferring the z-test over the chi-square test:

- the z-test yields a confidence interval for the difference between the proportions;
- running 2 or more z-tests is easier and results in a clearer output table than 2(+) contingency tables with chi-square tests.

Second, the z-test for independent proportions is *asymptotically* equivalent to the independent samples t-test: their results become more similar insofar as larger sample sizes are used. But -reversely- t-test results for proportions are “off” more insofar as sample sizes are smaller.

Other reasons for preferring the z-test over the t-test are that

- the z-test results in higher power and smaller confidence intervals insofar as smaller sample sizes are used;
- the t-test requires normally distributed dependent variables and equal population-variances whereas the z-test doesn't.

So -in short- use a z-test when appropriate. Your statistical package not including it is a poor excuse for not doing what's right.

Right, I guess that should do regarding the z-test. If you've any remarks on this tutorial or our Excel tool, please throw me a comment below.

Thanks for reading!

## References

- Van den Brink, W.P. & Koele, P. (1998).
*Statistiek, deel 2*[Statistics, part 2]. Amsterdam: Boom. - Van den Brink, W.P. & Koele, P. (2002).
*Statistiek, deel 3*[Statistics, part 3]. Amsterdam: Boom. - Warner, R.M. (2013).
*Applied Statistics (2nd. Edition)*. Thousand Oaks, CA: SAGE. - Agresti, A. & Franklin, C. (2014).
*Statistics. The Art & Science of Learning from Data.*Essex: Pearson Education Limited. - Howell, D.C. (2002).
*Statistical Methods for Psychology*(5th ed.). Pacific Grove CA: Duxbury. - Slotboom, A. (1987).
*Statistiek in woorden*[Statistics in words]. Groningen: Wolters-Noordhoff. - Cohen, J (1988).
*Statistical Power Analysis for the Social Sciences (2nd. Edition)*. Hillsdale, New Jersey, Lawrence Erlbaum Associates.

## THIS TUTORIAL HAS 81 COMMENTS:

## By Aviad on August 28th, 2016

What about mod(-16, 24) => 8?

I recommned that the negative usage, in this example calculating sleep hours between 23 (11pm) and 7am be mentioned as well.

## By YY Ma on November 5th, 2020

“Z-tests are painfully absent from most statistical packages including SPSS and JASP”.

But I wonder if the SPSS procedure "Crosstabs>Cells>z-test" does the similar thing .

## By Ruben Geert van den Berg on November 5th, 2020

Hi YY Ma!

We covered those in SPSS Chi-Square Test with Pairwise Z-Tests.

But if you're a researcher and need a z-test, would you settle for this? Where's the confidence interval for the difference? Where's the p-value? Where's the effect size? And how do I get those for 5 or 10 outcome variables?

IMHO, you can't reasonably claim these are an acceptable implementation of z-tests.

P.s. if it's only the (exact) p-value you're after, you could use a 2 by 2 chi-square independence test too but it also suffers from most of the issues I just mentioned.

## By YY Ma on November 5th, 2020

I have read the article 'SPSS Chi-Square Test with Pairwise Z-Tests'. And I totally agree with you.

SPSS does not implement z-test well.

Thanks.

## By Ruben Geert van den Berg on November 6th, 2020

...and neither does JASP. I'm curious about SAS and Stata but I don't have licenses for them.

Anyway, thanks for the recent discussions and educated comments. If you're on LinkedIn, you're more than welcome to connect with me.

Keep up the good work!