Aggregate is an SPSS command for creating variables holding statistics over cases. This tutorials briefly demonstrates the most common scenarios and points out some best practices.
SPSS Aggregate Command
The SPSS AGGREGATE
command typically works like so:
- One or more
BREAK
variables can be specified.In SPSS versions 15 and below, specifying at least oneBREAK
variable is mandatory. If you want statistics over all cases, usecompute constant = 0.
and useconstant
as the BREAK variable. - All cases with the same value(s) on the break variable(s) are referred to as a break group
- Each break group will become a single case in the aggregated data (unless
MODE = ADDVARIABLES
is used). - This new case has summary statistics over the original cases as new variables. Available statistics include the frequency, mean, maximum and many others. Consult the command syntax reference for a complete overview.
- The result of
AGGREGATE
may be the active dataset, a new dataset or a new data file. (This last option is not available forMODE = ADDVARIABLES
.) A new Dataset must first be declared before it can be specified inAGGREGATE
. - For a very basic demonstration, run the syntax below.
SPSS Aggregate Syntax Example
data list free/id.
begin data
3 5 5 8 8 8 9 9 9 9
end data.
*2. Create Dataset with id counts (called 'freq' for 'frequency').
aggregate outfile *
/break id
/freq = nu.
MODE = ADDVARIABLES
SPSS Aggregate - Mode = AddvariablesExcept for SPSS versions 12 and below, summary statistics of break groups can be appended to a Dataset without actually aggregating it. The syntax below demonstrates this.
SPSS Aggregate Syntax Example
aggregate outfile * mode = addvariables
/break id
/freq = nu.
Statistics over Multiple Variables
Summary statistics can be rendered over multiple variables in one go. The TO and ALL keywords can conveniently shorten the list of variables as shown in the syntax below.
data list free/v1 to v5.
begin data
1 2 3 4 5 6 7 8 9 10
end data.
*2. Aggregate multiple variables at once.
aggregate outfile *
/mean_1 to mean_5 = mean(v1 to v5).
Multiple Statistics
Different summary statistics (over the same or different variables) can be specified in a single command. This is demonstrated below (uses test data from previous example).
aggregate outfile *
/mean_1 to mean_5 = mean(v1 to v5)
/sd_1 to sd_5 = sd(v1 to v5).
Final Note
Lots of different things can be done with the AGGREGATE
command. This tutorial aimed at illustrating the most common scenarios found in practice. It is by no means intended as an exhaustive overview of all options.
THIS TUTORIAL HAS 3 COMMENTS:
By Maria Jose on October 14th, 2022
Hello dears, thank you for your contributions.
I very humbly ask you for advice. I am doing an exploration of poverty data, and I would need to add data on people, but I have reviewed the different functions and in no case do they finish characterizing the household.
I would need to define something like this... I have 3 codes "0", "1" and "99". I would need the variable added by household to reflect if there is at least one person with value "1".
That is to say that if there is at least one person with "1" the aggregate is "1"; in the event that they are all "0", but there is at least one "99", assign the household "99" (because nobody knows if that 99 could not be an unknown "1").
The added value should only be "0" in the case that unfailingly all members are "0"... is it possible to add data like this? with conditionals? (if)
Thanks a lot...
By Ruben Geert van den Berg on October 15th, 2022
Hola Maria!
Please try the syntax below.
Hope that helps!
SPSS tutorials
data list free/hh code.
begin data
1 0 1 1 1 99 1 0 2 0 2 99 3 1 4 1 4 1 4 1 5 0 5 1 5 1
end data.
aggregate outfile * mode addvariables
/break hh
/c01 'Household contains only ones' = pin(code 1,1).
recode c01 (100 = 1)(else = 0).
execute.
By Maria Jose on October 15th, 2022
Tomo nota
Muchisimas Gracias, muy amable.
Gracias por tu tiempo.