SPSS Chi-Square Independence Test
The chi-square independence test is a procedure for testing if two categorical variables are independent in some population. This holds if the frequency distribution of one variable is identical for each level of the other variable. If not, there's at least some relation between the 2 variables and a table or chart will tell us what this relation looks like.
SPSS Independent Samples Chi-Square Test Example
A marketeer wants to know the relation between the brand of smartphone people use and the brand they'd like to use. She'll first try to establish these are related in the first place by testing the null hypothesis that the current phone brand and the desired phone brand are independent. She collects data on 150 respondents, resulting in phone_brands.sav, part of which is shown below.
1. Quick Data Check
It's a good practice to always inspect your data before running any statistical tests. For the data at hand, a clustered bar chart is a nice option for seeing what the data basically look like. The screenshots below walk you through.
We first navigate to
Next, we select and
preferred to and
current to .
Clicking results in the syntax below.
GRAPH /BAR(GROUPED)=COUNT BY current BY preferred.
The main conclusion from this graph is that smartphone users are quite loyal to brands; users of every brand still prefer the brand they're using. The effect is strongest for HTC users. The four histograms are far from similar; independence between
preferred doesn't seem to hold even approximately.
2. Assumptions Chi-Square Independence Test
Although the chi-square independence test will run just fine in SPSS, the credibility of its results depend on some assumptions. These are
- independent and identically distributed variables (or, less precisely, “independent observations”);
- none of the cells has an expected frequency < 5.
Assumption 1 is mainly theoretical. The precise meaning of assumption 2 is explained in chi-square independence test. SPSS checks this assumption whenever you run this test so we'll see the result of that in a minute in our output.
3. Run SPSS Chi-Square Independence Test
We'll navigate to
current to and
preferred to .
Select under .
Clicking results in the syntax below.
crosstabs current by preferred
4. SPSS Chi-Square Independence Test Output
We'll first look at the Crosstabulation table. Since both variables have 4 answer categories, (4 * 4 =) 16 different combinations may occur in the data. For each combination (or “cell”), the table presents the frequency with which it occurs. We already saw a visual representation of these 16 observed frequencies in the graph we ran earlier.
Next, we'll inspect the Chi-Square Tests table. Now, the null hypothesis of independence implies that each cell should contain a given frequency. However, the observed frequencies often differ from such expected frequencies. The Pearson Chi-Square test statistic basically expresses the total difference between the 16 observed frequencies and their expected counterparts; the larger its value, the larger the difference between the data and the null hypothesis.
The p-value, denoted by “Asymp.Sig. (2-tailed)”, is .000. This means that there's a 0% chance to find the observed (or a larger) degree of association between the variables if they're perfectly independent in the population.
5. Reporting the Chi-Square Independence Test
We always report the crosstabulation of observed frequencies.“Contingency table” or “bivariate frequency distribution” are synonyms for crosstabulation. Regarding the significance test, we report the Pearson Chi-Square value, df (= degrees of freedom) and p-value as in “we observed a strong association between the current and the preferred brands, χ2(9) = 131.2, p = .000.”