# SPSS tutorials

BASICS DATA ANALYSIS T-TEST ANOVA CHI-SQUARE TEST

# Z-Test and Confidence Interval Proportion Tool

There's two basic tests for testing a single proportion:

For larger samples, these tests result in roughly similar significance levels. However, the binomial test only comes up with a 1-tailed p-value unless the hypothesized proportion = 0.5. Moreover, it can't compute a confidence interval for your proportion.
The z-test does not have these 2 limitations and is among the more widely used statistical tests. Very oddly, however, it's absent from SPSS. Plenty of reasons for us to present this very simple tool in the remainder of this tutorial.

## Installation

1. This tool requires SPSS version 18 or higher with the SPSS Python Essentials properly installed and tested.
3. For SPSS versions 18 through 22, select Utilities Extension Bundles Install Extension Bundle. For SPSS 24, select Extensions Install Local Extension Bundle.
Navigate to the confidence intervals extension (its file name ends in “.spe”, short for SPSS Extension) and install it.
4. Although you'll get a popup that the extension was successfully installed, it'll only work after you close and reopen SPSS entirely (unless you're on version 24).
5. You'll now find the tool under Utilities Confidence Interval Proportion.

## Operations

1. The test variables must have exactly two valid values. Variables violating this requirement will be skipped when calculating results.
2. The test variables may be any mixture of numeric and string variables.
3. The p-values and confidence intervals are based on the central limit theorem. This approximation is sufficiently accurate if p0*n and (1-p0)*n >5 where p0 denotes the population proportion under H0 and n is its related sample size.1 If this does not hold for one or more variables, a note will be added to the results.
4. If any SPLIT FILE is in effect, the tool will switch if off, throw a warning that it did so and then proceed as usual.
5. If a WEIGHT variable is in effect, results will be based on rounded frequencies. P-values may be biased if you're using non integer sampling weights but this holds for all p-values in SPSS except for those from the complex samples module.2,3,4

## Example

We'll now test our tool on test.sav, part of which is shown below. Our null hypothesis is that the population proportions of all dichotomous variables = 0.5.

## Data Inspection

We'll run a quick data check with FREQUENCIES to see if we need to specify any user missing values. This happens to be the case so we'll do just that.

*Show value and value labels in output.

set tnumbers both.

*Basic frequency tables.

frequencies q1 to passes.

*Set missing values.

missing values q1 to q4 (2).

*Show only value labels in output.

set tnumbers labels.

## Computing our Confidence Intervals

We'll go to Utilities Confidence Interval Proportion and fill out the main dialog as below.

Note that TO may be used for a range of variable names.
Clicking results in the syntax below.

*Note: requires confidence interval proportion tool to be properly installed in order to run.

CONFIDENCE_INTERVAL_PROPORTION VARIABLES = 'q1 to passes' TESTPROP = 0.5 LEVEL = 95.

## Results

Running our syntax results in a new dataset holding our results. Note that most variables have variable labels explaining their precise meaning. You can see them in variable view or hover over a variable’s name in data view as shown in the screenshot below.

We find back our (valid) frequencies in these results. Each test variable has in 2 rows, one for each value. You'll probably need just one of these rows but this configuration circumvents the need for specifying test values for each variable. We simply test both -whatever they may be.

Further right we find our z-test. Its p-value indicates the probability of finding the observed sample proportions if its population counterpart is exactly equal to the test proportion. Note that a continuity correction has been used for computing the z-values and their associated p-values. Finally, the last variables in our results hold our confidence intervals and -possibly- some notes on the results.

## Reporting Examples

Obviously, include your sample proportions and sample size in your report. Regarding the z-tests, we'll write something like “the proportion of people who answered q1 correctly did not differ from 0.5, as indicated by a z-test: z = 1.59, p = 0.11.”or “A z-test showed that more than 50% of our population answers q2 correctly, z = 2.0, p = 0.046.”

Thanks for reading, hope you'll like it!

## References

1. Van den Brink, W.P. & Koele, P. (2002). Statistiek, deel 3 [Statistics, part 3]. Amsterdam: Boom.
2. Fowler, F.J. (2009). Survey Research Methods. Thousand Oaks, CA: SAGE.
3. De Leeuw, E.D., Hox, J.J. and Dillman, D.A. (2008). International Handbook of Survey Methodology. New York: Lawrence Erlbaum Associates.
4. Kish, L. Weighting for Unequal Pi. Journal of Official Statistics, 8, 183-200.

# Let me know what you think!

*Required field. Your comment will show up after approval from a moderator.

# This tutorial has 9 comments

• ### By Ruben Geert van den Berg on January 11th, 2017

Hi Juan!

I retested the extension and it (still) works fine on my system. A few points, though:

-Your errors suggest that you have zero cases ("N of rows...") in your active dataset. The tool assumes you have some data open with at least some cases in it.
-If you run the tool for a second/third time, you'll need to close the newly created dataset with results. You can do so by running `DATASET CLOSE FREQS.` That should resolve "Dataset name freqs already defined."

-I'm not familiar with the PROPOR extension so I don't know if it actually uses Python or not -although it does seem rather likely. Perhaps test this anyway if the first two suggestions don't bring any improvement, ok?

Hope that helps!

Ruben

• ### By Juan Carlos Martin on January 11th, 2017

Hi Ruben

Thank you for your quick answer. I assume python is correctly installed as I am using now the PROPOR extension right now. These are the errors that I get:

CONFIDENCE_INTERVAL_PROPORTION VARIABLES = 'sex' TESTPROP = 0.5 LEVEL = 95.
Dataset Declare
Notes
Output Created 11-JAN-2017 16:20:31
Input Filter
Weight
Split File
N of Rows in Working Data File 0
Syntax DATASET DECLARE freqs.
Resources Processor Time 00:00:00,00
Elapsed Time 00:00:00,00

Warnings

• ### By Ruben Geert van den Berg on January 10th, 2017

Hi Juan! Are you sure you have the SPSS Python Essentials properly installed and tested? And if you're sure this isn't the problem, could you provide me with the details regarding your errors?

Muchas gracias!

P.s. it's "Ruben", not "Robert".

• ### By Juan Carlos Martin on January 10th, 2017

Thanks Robert for the amazing job.

I have a problem with this extension, it is not working for me, I only get messages of errors. But the PROPOR extension that Jon Peck is proposing works.

Do you know why this could be happening?

• ### By Jon Peck on September 22nd, 2016

There is an extension command named PROPOR that takes a different approach to the data. You specify numerator and denominator counts either in the syntax or as a pair of variables. The output displays binomial and Poisson CIs and, if more than one proportion is specified, it displays the difference from the first one and the CI for the difference. The command has a dialog box and syntax help. As it is an old command, the syntax help is produced via PROPOR /HELP, not via F1.