## Summary

How to draw one or many samples from your data in SPSS? This tutorial demonstrates some simple ways for doing so. We'll point out some tips, tricks and pitfalls along the way.

Let's get started and create some test data by running the syntax below.

## SPSS Syntax for Creating Test Data

***Create test data with 100 cases.**

data list free/id.

begin data

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

end data.

compute id = $casenum.

execute.

## Result

We now have 1 variable and 100 cases. For each case, our variable contains $casenum. The screenshot below shows the last handful of cases in our data.

## Simple Sampling Without Replacement

Simple random sampling without replacement means that objects that are sampled aren't “replaced” into the population and thus can't be sampled a second or a third time. This is the usual scenario for real world research. We'll discuss the alternative, simple random sampling *with* replacement near the end of this tutorial.

## SPSS Sampling Syntax Example 1

***1. Render sampling process replicable.**

set rng mc seed 1.

***2. Draw sample.**

sample 20 from 100.

execute.

## Result

## Notes

This first example is the easiest way to draw just one sample *when we know the number of cases* in our data (100 in our example). Note that we have 20 cases left after running it.

If we rerun our sampling syntax, we usually want the exact same random sample to come up. One way for ensuring this is running `SET RNG MC SEED 1.`

just prior to sampling.

## SPSS Sampling Syntax Example 2

We first rerun our test data syntax. Next, we'll run the syntax below.

***1. Compute random numbers between 0 and 1.**

compute s1 = rv.uni(0,1).

***2. Rank random numbers.**

rank s1.

***3. Select 20 cases with lowest random numbers.**

select if rs1 <= 20.

execute.

## Notes

As we'll see later on, this second example is a first step towards repeated sampling and stratified random sampling. On top, it doesn't require knowing how many cases we have in our data.

## SPSS Sampling Syntax Example 3

Again, we'll rerun our test data syntax, followed by the syntax below.

***1. Compute random numbers between 0 and 1.**

set seed 1.

compute s1 = rv.uni(0,1).

***2. Rank random numbers.**

rank s1.

***3. Recode rank variable into filter variable.**

recode rs1 (lo thru 20 = 1)(else = 0).

***4. Switch filter on.**

filter by rs1.

***5. Inspect output.**

descriptives id.

## Result

## Notes

This third examples uses FILTER rather than deleting unsampled cases. This leaves all our cases -including a variable that indicates our sample- nicely intact in our data. As shown below, the strikethrough in data view as well as the status bar tell us that a filter is in effect.

## SPSS Repeated Sampling Example 1

Repeated random sampling is the basis for most simulation studies. We presented such simulations for explaining the basic idea behind ANOVA and the chi-square test.

Simulation studies usually require looping over SPSS procedures, which are basically commands that inspect data values. The right way for doing so is with Python as shown in the syntax below. Running it requires the SPSS Python Essentials to be properly installed.

## SPSS Repeated Sampling with Python Syntax

***Requires SPSS Python Essentials: draw 10 samples of 20 cases and compute descriptives on each.**

begin program.

import spss

for sample in range(10):

spss.Submit('''

temporary.

sample 20 from 100.

descriptives id.

''')

end program.

## Notes

We use TEMPORARY here for drawing our repeated samples. Note that we're basically simulating a sampling distribution over mean scores here. These mean scores (over 20 cases each) will be roughly normally distributed due to the central limit theorem.

## SPSS Repeated Sampling Example 2

The syntax below uses a different approach for repeated sampling that'll be the basis for simple random sampling *with* replacement later on. All sample variables will be left in our data -a feature we may or may not like.

## SPSS Repeated Sampling with Python Syntax

***1. Create 10 variables with random numbers.**

do repeat #s = s1 to s10.

compute #s = rv.uniform(0,1).

end repeat.

***2. Rank previous variables.**

rank s1 to s10.

***3. Convert rank variables into filter variables.**

recode rs1 to rs10 (lo thru 20 = 1)(else = 0).

***4. Run filtered descriptives.**

begin program.

import spss

for sample in range(1,11):

spss.Submit('''

filter by rs%d.

descriptives id.

'''%sample)

end program.

***5. Switch off filter.**

filter off.

## Simple Sampling With Replacement

Strictly, most inferential statistics quietly assume that our data are obtained by simple random sampling with replacement. A textbook example is drawing a marble from a vase, writing down its color and putting it back into the vase before sampling a second (third...) marble. Like so, each marble may be sampled several times.

The syntax below demonstrates simple random sampling with replacement in SPSS. It uses both WEIGHT and FILTER in order for the sample to take effect.

## Simple Random Sampling With Replacement Syntax

***1. Sampling 50 out of 100 cases with replacement.**

set seed 1.

do repeat #s = s1 to s50.

compute #s = rv.uni(0,1).

end repeat.

***2. Rank random numbers.**

rank s1 to s50.

***3. Each rank variable represents a single draw with replacement.**

recode rs1 to rs50 (2 thru hi = 0).

***4. The sum of our 50 draws is our sample variable.**

compute sample = sum(rs1 to rs50).

***5. Inspect sample variable in data view.**

sort cases by sample (d).

***6. Sample variable should sum to n = 50 cases.**

descriptives sample/statistics min max mean sum.

***7. Filter not necessary but circumvents warning when WEIGHT is used.**

filter by sample.

***8. Use sample as weight variable.**

weight by sample.

***9. Descriptives on 50 cases.**

descriptives id.

## This tutorial has 2 comments

## By Ruben Geert van den Berg on April 11th, 2017

Then perhaps you're doing something wrong? Make sure the variable you're actually RANKing is something like

COMPUTE TMP = RV.UNIFORM(0,1).

and

notthe variable that tells you if someone is the head of the household.## By JB on April 11th, 2017

Hello! Thanks for your help! I followed the example in Draw a Stratified Random Sample, but I always get the head as the sample in all the households.