We had 5 judges taste and rank 6 different beers. The data -shown above- are in beertest.sav. For answering which beer was rated best, a Friedman test would be appropriate because our rankings are ordinal variables. A second question, however, is
to what extent do all 5 judges agree on their beer rankings?
If our judges don't agree at all which beers were best, then we can't possibly take their conclusions very seriously. Now, we could say that “our judges agreed to a large extent” but we'd like to be more precise and express the level of agreement in a single number. This number is known as Kendall’s Coefficient of Condordance *W*.^{2,3}

## Kendall’s W - Basic Idea

Let's consider the 2 hypothetical situations depicted below: perfect agreement and perfect disagreement among our raters. I invite you to stare at it and think for a minute.

As we see, the extent to which raters agree is indicated by the **extent to which the column totals differ**. We can express the extent to which numbers differ as a number: the variance or standard deviation.

**Kendall’s W** is defined as

$$W = \frac{Variance\,over\,column\,totals}{Maximum\,possible\,variance\,over\,column\,totals}$$

As a result, Kendall’s W is **always between 0 and 1**. For instance, our perfect disagreement example has W = 0; because all column totals are equal, their variance is zero.

Our perfect agreement example has W = 1 because the variance among column totals is equal to the maximal possible variance. No matter how you rearrange the rankings, you can't possibly increase this variance any further. Don't believe me? Give it a go then.

So what about our actual beer data? We'll quickly find out with SPSS.

## Kendall’s W in SPSS

We'll get Kendall’s W from SPSS’

menu. The screenshots below walk you through.Note: SPSS thinks our rankings are nominal variables. This is because they contain few distinct values. Fortunately, this won't interfere with the current analysis. Completing these steps results in the syntax below.

## Kendall’s W - Basic Syntax

***Kendall's W from nonparametric tests - legacy dialogs - k related samples.**

NPAR TESTS

/KENDALL=beer_a beer_b beer_c beer_d beer_e beer_f

/MISSING LISTWISE.

## Kendall’s W - Output

And there we have it: **Kendall’s W = 0.78**. Our beer judges agree with each other to a reasonable but not super high extent. Note that we also get a table with the (column) mean ranks that tells us which beer was rated most favorably.

## Average Spearman Correlation over Judges

Another measure of concordance is the average over all possible Spearman correlations among all judges.^{1} It can be calculated from Kendall’s W with the following formula

$$\overline{R}_s = {kW - 1 \over k - 1}$$

where \(\overline{R}_s\) denotes the average Spearman correlation and \(k\) the number of judges.
For our example, this comes down to

$$\overline{R}_s = {5(0.781) - 1 \over 5 - 1} = 0.726$$

We'll verify this by running and averaging all possible Spearman correlations in SPSS. We'll leave that for a next tutorial, however, as doing so properly requires some highly unusual -but interesting- syntax.

Thank you for reading.

## References

- Howell, D.C. (2002).
*Statistical Methods for Psychology*(5th ed.). Pacific Grove CA: Duxbury. - Slotboom, A. (1987).
*Statistiek in woorden*[Statistics in words]. Groningen: Wolters-Noordhoff. - Van den Brink, W.P. & Koele, P. (2002).
*Statistiek, deel 3*[Statistics, part 3]. Amsterdam: Boom.

## This tutorial has 13 comments

## By Eduardo on April 23rd, 2018

The information that the page gives is very valuable. I would like to ask if there is another concordance index in which the judges rate the items from 1 to 5, in 4 alternatives (for example: sufficiency, relevance, coherence and clarity)

## By Fajar Adinugraha on November 19th, 2017

Thanks...

## By Mohsin on November 12th, 2017

Thank you Ruben, for finding time to reply my question. That was really helpful. Love and respect for the work you are doing.

Regards,

Mohsin.

## By Ruben Geert van den Berg on November 2nd, 2017

Hi Mohsin!

No, W does not at all become meaningless! Technically, you could call it an effect size measure or

association measure: a number that indicates the strength of some effect or association, regardless of statistical significance. And the value of 0.78 is reasonably high.However, W = 0.78 is not necessarily statistically significantly different from 0, this depends on the sample size. The p-value merely states that there's a 0.2% probability of finding W = 0.78 in a sample if it is 0 in the entire population. Since this is a very low probability, we conclude that W is

not zero-important but not super interesting.So the p-value and W convey 2 separate -but complimentary- pieces of information.

Many statistical tests distinguish a p-value (probability something is not zero) and an effect size (if not zero, then how large?).

Hope that makes some sense!

## By Mohsin on November 1st, 2017

Hi Ruben!

I want to draw your attention towards the value of asymptotic sig. (0.002) in the above example. It means we have to reject the null hypothesis (which SPSS states is that the distribution of beer a, b, c, d , and e are the same. In such a case, how do you interpret W value?. Does it not become meaningless?