A course was evaluated by 183 students. The data are in course_evaluation.sav, part of which is shown below. The teacher wants to know the average age of his students but we only have their date of birth.
1. Ensure Date of Birth is a Date Variable
The first thing we'll do is check if date of birth is a real date variable. We readily see in variable view that this is the case here. Sometimes dates end up in SPSS as string variables and if so, we first need to convert them to date variables. Some examples for doing so are in Convert String to Date Variable.
2. Choose a Comparison Date
Since (average) age is literally changing every second, we need to answer “age at which point in time?” The most obvious option is age at the moment the data were collected. Such a completion date may be present in your data. If it isn't, we'll make an educated guess.
3. Compute Age with Known Completion Date
Our data hold a variable cdate which contains the completion dates for the questionnaire. We'll now easily compute age with the syntax below and we'll inspect its histogram to make sure the result has a plausible distribution.
compute age = datediff(cdate,bdate,'days') / 365.25.
*Inspect if result has plausible distribution.
frequencies age
/format notable
/histogram.
*All ages between 19 and 27 years. Looks perfect.
Result
So we basically computed the number of days between date of birth and completion and divided that by 365.25, the average number of days in a year. You may wonder why we don't just use DATEDIFF(cdate,bdate,'years'). We'll get to that in a minute.
4. Compute Age with Unknown Completion Date
If we don't have a completion date in our data, we'll try and make a good guess. Let's say we guess January 1, 2015. We can convert this into an SPSS date value by using date.dmy(1,1,2015) and thus create our guessed completion date as a new variable in our dataset. Alternatively, we may insert this function directly into our age computation formula as shown below.
compute age2 = datediff(date.dmy(1,1,2015),bdate,'days') / 365.25.
execute.
Days or Years?
So why did we extract days and divide those by 365.25, the average number of days in a year? The simple reason is that SPSS truncates the outcome of DATEDIFF. This means that someone who is 20 years and 364 days old will be assigned an age of 20.00 years, which is almost an entire year off.
compute age3 = datediff(cdate,bdate,'years').
execute.
Result
This probably convinces you that extracting years directly is not a good idea: on average, we'll underestimate age by half a year by doing so.For the sake of simplicity, we'll assume that birthdays are uniformly distributed over the year, which I believe roughly holds.
Final Notes
If you don't want to see any decimal places, your best option is probably running formats age (f3). which will display all ages as integers. Alternatively, if you want ages to be integers, you could run compute age = rnd(age). but this obviously introduces some error -bad but not quite as bad as the aforementioned bias.
It guess that's about it. I hope you found this tutorial helpful. Thanks for reading!
THIS TUTORIAL HAS 6 COMMENTS:
By Linda Martell on July 15th, 2015
good
By Jon Peck on April 6th, 2017
Datediff is designed to do exact calculations in the same way that, say, the Motor Vehicle Department would: you are 20 until your 21st birthday. Dividing by 365.25 is fine for most statistical purposes but, of course, does not adjust exactly for leap years. Age in days would work for statistics, too.
By Heba on September 5th, 2017
I am trying to convert birthdate into age using exactly this. It has worked, however certain age variables keep coming out incorrect and in negative. I checked my input and it's all correct, what could be the problem or how to do I correct that?
By Ruben Geert van den Berg on September 5th, 2017
Hi Heba!
The first thing that comes to mind is that dates with 2-digit years can look different from what they are. I mean that "01-02-03" probably means 1 February 2003 but it could be 1 February 1903, 1803 or basically anything. I've encountered this problem with real life data (imported from Excel I think). You can check if this is the case by running something like
FORMATS BIRTHDAY (DATE11).
This does not change anything about your data. It just displays it differently. If that's not the problem, share a handful of your data with me and I'll look into them.
By William Peck on November 2nd, 2018
I always truncate the partial year, that's the common way of presenting age, since you're not 18 until you're 18, you're 17 ... :-)
Good explanation and I like that the datasets are like 95% match to the web page. I'm on v25.