By Ruben Geert van den Berg on September 16, 2014 under SPSS Chi-Square Test Tutorials.

# SPSS Chi-Square Independence Test

The chi-square independence test is a procedure for testing if two categorical variables are independent in some population. This holds if the frequency distribution of one variable is identical for each level of the other variable. If not, there's at least *some* relation between the 2 variables and a table or chart will tell us what this relation looks like.

## SPSS Independent Samples Chi-Square Test Example

A marketeer wants to know the relation between the brand of smartphone people use and the brand they'd *like to* use. She'll first try to establish these are related in the first place by testing the **null hypothesis** that
the current phone brand and the desired phone brand are independent.
She collects data on 150 respondents, resulting in phone_brands.sav, part of which is shown below.

## 1. Quick Data Check

It's a good practice to always inspect your data before running any statistical tests. For the data at hand, a clustered bar chart is a nice option for seeing what the data basically look like. The screenshots below walk you through.

We first navigate to

Next, we select and

.

Click

Select

move `preferred`

to and

`current`

to .

Clicking results in the syntax below.

***Check data with grouped bar chart.**

GRAPH /BAR(GROUPED)=COUNT BY current BY preferred.

The main conclusion from this graph is that smartphone users are quite loyal to brands; users of every brand still prefer the brand they're using. The effect is strongest for HTC users. The four histograms are far from similar; independence between `current`

and `preferred`

doesn't seem to hold even approximately.

## 2. Assumptions Chi-Square Independence Test

Although the chi-square independence test will run just fine in SPSS, the credibility of its results depend on some assumptions. These are

- independent and identically distributed variables (or, less precisely, “independent observations”);
- none of the cells has an expected frequency < 5.

Assumption 1 is mainly theoretical. The precise meaning of assumption 2 is explained in chi-square independence test. SPSS checks this assumption whenever you run this test so we'll see the result of that in a minute in our output.

## 3. Run SPSS Chi-Square Independence Test

We'll navigate to

We'll move `current`

to and

`preferred`

to .

Select under .

Clicking results in the syntax below.

## Syntax

***Run crosstab with chi-square independence test.**

crosstabs current by preferred

/stat chisq.

## 4. SPSS Chi-Square Independence Test Output

We'll first look at the **Crosstabulation** table. Since both variables have 4 answer categories, (4 * 4 =) 16 different combinations may occur in the data. For each combination (or “cell”), the table presents the frequency with which it occurs. We already saw a visual representation of these 16 **observed frequencies** in the graph we ran earlier.

Next, we'll inspect the **Chi-Square Tests** table. Now, the null hypothesis of independence implies that each cell should contain a given frequency. However, the observed frequencies often differ from such **expected frequencies**. The **Pearson Chi-Square** test statistic basically expresses the total difference between the 16 observed frequencies and their expected counterparts; the larger its value, the larger the difference between the data and the null hypothesis.

The p-value, denoted by “Asymp.Sig. (2-tailed)”, is .000. This means that there's a 0% chance to find the observed (or a larger) degree of association between the variables if they're perfectly independent in the population.

## 5. Reporting the Chi-Square Independence Test

We always report the crosstabulation of observed frequencies.“Contingency table” or “bivariate frequency distribution” are synonyms for crosstabulation. Regarding the significance test, we report the Pearson Chi-Square value, df (= degrees of freedom) and p-value as in *“we observed a strong association between the current and the preferred brands, χ ^{2}(9) = 131.2, p = .000.”*

## This Tutorial has 53 Comments

## By Abdinor Farah Yusuf on March 28th, 2017

I want to be your student

## By Wendy on March 8th, 2017

Ahh, I've finally got it. Thank you!! That tutorial is great btw. Thanks so much for your help Ruben!

## By Ruben Geert van den Berg on March 8th, 2017

Sorry, I don't entirely get it.

First off: does Group A, Group B refer to age groups? Or are three variables (group, age group, body composition) involved?

Second, what is your target population? That is:

about whom or what are you trying to find out something?And do you have data on this entire population? If so, there's no point in going beyond describing your data (= your population).However, our data are often -not always- just a tiny sample from our population. Statistical tests basically deal with generalizing sample outcomes to larger populations: are sample differences large enough to conclude there's differences in some (much larger) population as well?