SPSS Data Preparation 2 – Initial Data Checks

1. SPSS Case Count and Variable Count

(Overview and data file are found here.)

The very first thing we want to know about basically any data file, are its dimensions: how many cases and how many variables does it contain? For a quick case count, select any cell in data view and press the CTRL + shortkey. Alternatively, just scroll all the way down with the scroll bar.

Our file contains 601 cases. Applying the same method in variable view tells us that we have 13 variables. Since we may at some point delete some cases and/or variables, we personally like to add a comment to our syntax file on the original dimensions. The screenshot below shows what it looks like.

SPSS Data Preparation - Data Dimensions as Comment in Syntax File

2. Unique Case Identifier Variable

(Overview and data file are found here)

Data files may or may not have a unique case identifier variable: a variable with a distinct value for every case. In some cases, a combination of two (or more) variables serves this purpose.
It's a good idea to have a unique identifier for three reasons: first, if you remove variables from your data because they don't seem relevant, you can later decide to merge them back in as shown in MATCH FILES. Second, if a case contains some unusual value, you can correct it if you can address this -and only this- case. Third, a single identifier may be used in various data sources containing similar records. If so, having this identifier in your data enables you to merge your (edited) data with these other data sources.
Our data seems to contain id as a unique case identifier. But how can we be really sure that none of its values occur more than once? The syntax below does so by using AGGREGATE.

*1. Create cnt, holding frequencies for id.

aggregate outfile * mode addvariables
/break id
/cnt = n.

*2. If cnt contains only 1, every value of id occurs once and hence it's a unique identifier.

frequencies cnt.

Result

This frequency table tells us that the only value in this variable is 1. Hence, we do indeed have a unique case identifier. Otherwise, the second best option is to create one before doing anything else with the data. The syntax below shows one option for doing so, using the outline numbers in data view, known as $casenum.

*Create unique identifier (not necessary for these data).

compute ident = $casenum.
execute.

Tell us what you think!

THIS TUTORIAL HAS 8 COMMENTS:

By Jeanette on October 23rd, 2015

How would you handle it if persons where allowed multiple attempts at completing a survey? Is there a way to combine their attempts such that there is only one record remaining for each person.
By Ruben Geert van den Berg on October 24th, 2015

Thanks for your comment, good question! What to do exactly depends on your data.

If you have a unique identifier variable for each respondent (not attempt!) and each attempt results in a case (row of data values in SPSS), then you can probably solve the problem with AGGREGATE. Use the respondent identifier as the BREAK variable and create the MAX over all relevant variables.

But I'd need to see the actual data to confirm this works for your particular case.
By Maliha on November 23rd, 2015

1.How to analyeze data across two files in spss?
2.How to compare two or more files in spss?
By Ruben Geert van den Berg on November 23rd, 2015

Thanks for your comment! These are rather elaborate questions but a likely strategy for both is to merge the files using ADD FILES. Make sure both files contain a source variable that indicates the source (file 1 or file 2) for each case.

Now you can readily create basic tables such as CROSSTABS and MEANS for comparing variables between the two files. Most basic graphs found under "legacy dialogs" in recent SPSS versions allow for a column variable. Specifying the source variable here results in nice graphs with two (or more) panes as shown in this illustration.

Hope that helps!
By Neha on April 1st, 2016

Impressive!!

1 2

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

SPSS Data Preparation 2 – Initial Data Checks

1. SPSS Case Count and Variable Count

2. Unique Case Identifier Variable

Result

Tell us what you think!

THIS TUTORIAL HAS 8 COMMENTS:

By Jeanette on October 23rd, 2015

By Ruben Geert van den Berg on October 24th, 2015

By Maliha on November 23rd, 2015

By Ruben Geert van den Berg on November 23rd, 2015

By Neha on April 1st, 2016