Z-Test for 2 Independent Proportions – Quick Tutorial

Definition & Introduction

A z-test for 2 independent proportions examines
if some event occurs equally often in 2 subpopulations.
Example: do equal percentages of male and female students answer some exam question correctly? The figure below sketches what the data required may look like.

Z Test Independent Proportions

Z-Test - Simple Example

A simple random sample of n = 175 male and n = 164 female students completed 5 exam questions. The raw data -partly shown below- are in z-test-independent-proportions.xlsx.

Z Test Independent Proportions Example Data

Let's look into exam question 1 first. The raw data on this question can be summarized by the contingency table shown below.

Z Test Contingency Table Question 1

Right, so our contingency table shows the percentages of male and female respondents who answered question 1 correctly. In statistics, however, we usually prefer proportions over percentages. Summarizing our findings, we see that

In our sample, female students did slightly better than male students. However, sample outcomes typically differ somewhat
from their population counterparts.
Even if the entire male and female populations perform similarly, we may still find a small sample difference. This could easily result from drawing random samples of students. The z-test attempts to nullify this hypothesis and thus demonstrate that the populations really do perform differently.

Null Hypothesis

The null hypothesis for a z-test for independent proportions is that the difference between 2 population proportions is zero. If this is true, then the difference between the 2 sample proportions should be close to zero. Outcomes that are very different from zero are unlikely and thus argue against the null hypothesis. So exactly how unlikely is a given outcome? Computing this is fairly easy but it does require some assumptions.


The assumptions for a z-test for independent proportions are

So what are sufficient sample sizes? Most text books2,3,4,5 state that the test results are sufficiently accurate if

are all met. The Excel tool we'll present in a minute checks automatically if all 4 conditions are met.

Z-Test Formulas

For computing our z-test, we first simply compute the difference between our sample proportions as
$$dif = p1 - p2$$
For our example data, this results in $$dif = 0.720 - 0.768 = -0.048.$$
The standard error for this difference depends on the population proportion. We obviously don't know it but we can compute the estimated population proportion \(\hat{p}\) as $$\hat{p} = \frac{p_1\cdot n_1 + p_2\cdot n_2}{n_1 + n_2}$$
For our example data, that'll be $$\hat{p} = \frac{0.720\cdot 175 + 0.768\cdot 164}{175 + 164} = 0.743$$
Note that this is simply the overall proportion of our sample who answered correctly. This is readily verified in the contingency table we presented earlier.

Anyway. We can now estimate the standard error for the difference as
$$\hat{SE}_{dif} = \sqrt{\hat{p}\cdot (1-\hat{p})\cdot(\frac{1}{n_1} + \frac{1}{n_2})}$$
For our example, that'll be $$\hat{SE}_{dif} = \sqrt{0.743\cdot (1-0.743)\cdot(\frac{1}{175} + \frac{1}{164})} = 0.047$$
We can now readily compute our test statistic \(Z\) as
$$Z = \frac{dif - \delta}{\hat{SE}_{dif}}$$
where \(\delta\) denotes the hypothesized population difference. Our null hypothesis states that \(\delta\) = 0 (both population proportions equal). So for our example, $$Z = \frac{-0.048 - 0}{0.047} = -1.02$$
If the z-test assumptions are met, then \(Z\) approximately follows a standard normal distribution. From this we can readily look up that
$$P(Z\lt -1.02) = 0.155$$
so our 2-tailed significance is
$$P(2-tailed) = 0.309$$
Conclusion: we don't reject the null hypothesis. If the population difference is zero, then finding the observed sample difference or a more extreme one is pretty likely. Our data don't contradict the claim of male and female student populations performing equally on exam question 1.

Confidence Intervals for Z-Test

Our data show that the difference between our sample proportions, \(dif\) = -0.048: the percentage of females who answered correctly is some 4.8% higher than that of males. Without any further information, this is our best guess for the population difference.

However, since our 4.8% is only based on a sample, it's likely to be somewhat “off”. So precisely how much do we expect it to be “off”? We can answer this by computing a confidence interval.

First off, the previous section showed that
$$Z = \frac{dif - \delta}{\hat{SE}_{dif}}$$
so the amount that our sample difference is likely to be off is $$dif - \delta = Z \cdot \hat{SE}_{dif}$$
So which value should we use for \(Z\) here? Well, this depends on our confidence level, which is often chosen as 95%. As illustrated below, the 95% most likely z-values roughly lie between z = -1.96 and z = 1.96. The exact values can easily be looked up in Excel or Googlesheets as shown in Normal Distribution - Quick Tutorial.

Standard Normal Distribution With Critical Values

Now, any confidence interval for \(\delta\) can be constructed as $$dif + Z_{\frac{1}{2}\alpha}\cdot \hat{SE}mean \lt \delta \lt dif + Z_{1 - \frac{1}{2}\alpha}\cdot \hat{SE}mean$$
Therefore, the 95% CI for our example data is
$$-0.048 + (-1.96)\cdot 0.047 \lt \delta \lt -0.048 + 1.96 \cdot 0.047 = $$ $$-0.141 \lt \delta \lt 0.045$$
Note that this 95% confidence contains zero: a zero difference between the population proportions is within a likely range. That is, males and females performing equally on exam question 1 is not an unlikely hypothesis given the data at hand. And that's why we didn't reject this null hypothesis when testing at alpha = 0.05.

Effect Size I - Cohen’s H

Our sample proportions are p1 = 0.72 and p2 = 0.77. Should we consider that a small, medium or large effect? A likely effect size measure is simply the difference between our proportions. However, a more suitable measure is Cohen’s H, defined as $$h = |\;2\cdot arcsin\sqrt{p1} - 2\cdot arcsin\sqrt{p2}\;|$$
where \(arcsin\) refers to the arcsine function.

Basic rules of thumb7 are that

For our example data, Cohen’s H is
$$h = |\;2\cdot arcsin\sqrt{0.72} - 2\cdot arcsin\sqrt{0.77}\;|$$
$$h = |\;2\cdot 1.01 - 2\cdot 1.07\;| = 0.11$$
Our rules of thumb suggest that this effect is close to negligible.

Effect Size II - Phi Coefficient

An alternative effect size measure for the z-test for independent proportions is the phi coefficient, denoted by φ (the Greek letter “phi”). This is simply a Pearson correlation between dichotomous variables.

Following the rules of thumb for correlations7, we could propose that

However, we feel these rules of thumb are clearly disputable: they may be overly strict because | φ | tends to be considerably smaller than | r |. Anway. If anybody has a better idea, let me know.

Excel Tool for Z-Tests

Z-tests are painfully absent from most statistical packages including SPSS and JASP. We therefore developed z-test-independent-proportions.xlsx, partly shown below.

Z Test Independent Proportions Excel Tool

Given 2 sample proportions and 2 sample sizes, our tool

We prefer this tool over online calculators because

SPSS users can readily create the exact right input for the Excel tool with a MEANS command such as

*Create table with sample sizes and proportions for v1 to v5 by sex.

means v1 to v5 by sex
/cells count mean.

Doing so for 2+ dependent variables results in a table as shown below.

SPSS Means Ouput Table For Z Test Excel Tool

Note that all dependent variables must follow a 0-1 coding in order for this to work.

Relation Z-Test with Other Tests

An alternative for the z-test for independent proportions is a chi-square independence test. The significance level of the latter (which is always 1-tailed) is identical to the 2-tailed significance of the former. Upon closer inspection, these tests -as well as their assumptions- are statstically equivalent. However, there's 2 reasons for preferring the z-test over the chi-square test:

Second, the z-test for independent proportions is asymptotically equivalent to the independent samples t-test: their results become more similar insofar as larger sample sizes are used. But -reversely- t-test results for proportions are “off” more insofar as sample sizes are smaller.
Other reasons for preferring the z-test over the t-test are that

So -in short- use a z-test when appropriate. Your statistical package not including it is a poor excuse for not doing what's right.

Right, I guess that should do regarding the z-test. If you've any remarks on this tutorial or our Excel tool, please throw me a comment below.

Thanks for reading!


  1. Van den Brink, W.P. & Koele, P. (1998). Statistiek, deel 2 [Statistics, part 2]. Amsterdam: Boom.
  2. Van den Brink, W.P. & Koele, P. (2002). Statistiek, deel 3 [Statistics, part 3]. Amsterdam: Boom.
  3. Warner, R.M. (2013). Applied Statistics (2nd. Edition). Thousand Oaks, CA: SAGE.
  4. Agresti, A. & Franklin, C. (2014). Statistics. The Art & Science of Learning from Data. Essex: Pearson Education Limited.
  5. Howell, D.C. (2002). Statistical Methods for Psychology (5th ed.). Pacific Grove CA: Duxbury.
  6. Slotboom, A. (1987). Statistiek in woorden [Statistics in words]. Groningen: Wolters-Noordhoff.
  7. Cohen, J (1988). Statistical Power Analysis for the Social Sciences (2nd. Edition). Hillsdale, New Jersey, Lawrence Erlbaum Associates.

Tell us what you think!

*Required field. Your comment will show up after approval from a moderator.