This tutorial walks you through opening a .csv data file in SPSS. We chose a .csv file as obtained from Google Analytics (GA). Since many readers may be unfamiliar with GA, the screenshot below gives a basic idea of what the data look like before we export them as a .csv file.
The original data file -as generated by GA- can be downloaded here. We chose it because opening it in SPSS causes some complications that you'll typically encounter with real life data. We'll take our time to explain and resolve these issues.
Right, so where do we start? Well, .csv files are typically associated with spreadsheet editors such as MS Excel or OpenOffice Calc. However, a better idea is to inspect the file with notepad++ or a similar text editor. As we'll see in a minute, this allows us to gather some essential pieces of information on the file contents. The screenshot below shows what the data look like in Notepad++.
Contents of the CSV file
Note that the first 6 lines of the file are a basic description of the data. We'll remove them after inspecting the rest of the data.
Line 7 contains variable names. Note that most of them are not suitable for SPSS because they contain spaces or start with a percent sign. We'll clean them up manually.
The actual data start on line 8, which will be the second line of data after removing the first 6 lines.
Note how data values are separated by commas (hence “comma separated values” or .csv file). These commas are known as the delimiter.
If a value contains a comma, there's double quotes around it. This is the text qualifier. These quotes indicate that commas between them are not meant as delimiters.
More Contents of the CSV file
If we quickly scroll down the file (tip: use the pgdn key on your keyboard), we see that line 318 is the last line containing raw data values.
Lines 319 and below contain summary statistics which are not part of the data we want. We therefore remove them.
Last but not least, note “UTF-8” in the status bar. This is the file encoding. It basically tells how characters are mapped onto the bits (ones and zeroes) that the file really consists of.
Cleaning Up the CSV File
We now clean up the .csv file like we just discussed. After doing so, we'll save it with a different file name. This enables us to compare our manually edited file with the original file as GA generated it. The final .csv file that we'll open in SPSS can be downloaded from browsers.csv.
Opening the CSV File in SPSS
We're now ready to open our .csv file in SPSS. For doing so, we'll first launch SPSS and navigate to.
We saw in Notepad++ that the file encoding is UTF-8. When we select the file, SPSS may switch to “Local Encoding” here. If that happens, set it back to UTF-8 before proceeding.
Click instead of .
Remember that variable names make up the first line of our .csv file.
Here we specify the delimiter we identified when we inspected the file in Notepad++.
Like we saw earlier, the only text qualifier in our raw data is a double quote.
We could specify some variable formats here. However, taking care of that later yields better results faster.
Following the previous steps results in the syntax below. Readers who'd like to copy-paste the syntax from this tutorial may either correct the
/FILE specification (second line of syntax) or create a folder
d:/project_00/edit and move browsers.csv into this folder.
Before running this syntax, we remove the last line; we didn't ask for any
DATASET NAME ... command and running it just gets in our way.
GET DATA /TYPE=TXT
DATASET NAME DataSet1 WINDOW=FRONT.
System Missing Values
After running the syntax, let's take a look at data view. Note that there's some system missing values in sessions. The reason for this is that some values contain commas. If we take a close look at the syntax, we see that it contains sessions F5.0 This means that the values of the variable sessions are interpreted according to a f format (standard numeric) which doesn't understand commas. The solution is replacing this line by sessions comma6 We do so for all standard numeric variables. Don't worry if you're not sure which variables that refers to; we'll present the entire corrected syntax in a minute.
The other variables may look fine at first, but inspecting variable view shows many of them to be string variables. If we think about the data for a minute, we conclude that screen_resolution should be the only string variable here.
This is the same problem -and the same solution- as the system missing values we saw earlier. For instance, we find perc_new_sessions A7 in our syntax. The A format indicates a string variable and SPSS has chosen this format here because of the percentage signs in the data. The solution is replacing the line by perc_new_sessions pct7.2 In a similar vein, we choose this format for all percentage variables. Again, we'll present the entire corrected syntax in a minute.
Our data contain one time variable: avg_session_duration. Our syntax addresses it as avg_session_duration A8 Our time values are in hh:mm:ss format. In SPSS, this is the time8 format (where 8 denotes the number of characters including the colons). We therefore replace the previous line by avg_session_duration time8
Final Corrected Syntax
After making all modifications as previously explained, we end up with the syntax below. This is the final syntax we'll use for opening the .csv file in SPSS.
GET DATA /TYPE=TXT
Attentive readers may notice that we didn't prevent goal_value from ending up as a string variable. This is because the previous steps couldn't prevent this. For the sake of completeness, we'll fix it with the syntax below.
As explained elsewhere, we always make sure that ALTER TYPE doesn't create any system missing values. We do so in step 3.
Syntax for Fixing Euro Signs
compute goal_value = char.substr(goal_value,2).
*2. Convert to numeric variable.
alter type goal_value(f4.2).
*3. Confirm that all 311 values are non missing.
*4. Set cca (Custom Currency A) to Euro.
set cca '-,€ ,,'.
*5. Set goal_value to cca.
Extracting Screen Width
That's it. We now have a perfect SPSS file holding our .csv data. If that's all you wanted to know, you can stop reading right here, right now.
Right, note that the values of screen_resolution consist of two numbers separated by an x. The first number is the screen width in pixels and we'll extract it by running the syntax below.
compute pos_x = char.index(screen_resolution,'x').
*2. Create new, empty string variable.
*3. Compute width = characters of screen_resolution up to position of "x".
compute width = char.substr(screen_resolution,1,pos_x - 1).
*4. Change width to numeric variable.
alter type width(f5).
*5. Make sure ALTER TYPE doesn't create system missing values.
Cumulative Percentages of Sessions per Screen Width
Finally. We're ready to create the table we're after: we'd like to know the cumulative percentages of sessions per screen width, from largest to smallest. Note that most cases represent many sessions. In order to have SPSS interpret each case as the number of sessions it represents, we'll use WEIGHT as shown below.
Finally, we'll run a FREQUENCIES table which we'll sort according to descending screen width.
SPSS Syntax for Cumulative Percentages
weight by sessions.
*2. Create frequencies table, sort by descending widths.
frequencies width/format dvalue.
Due to weighting our cases, SPSS reports 11,607 cases. This is the exact same number of sessions that we saw in the Google Analytics (GA) interface (very first screenshot). We can see this as a checksum: these numbers being identical strongly suggests that our SPSS data perfectly correspond to the original GA data.
The purpose of this study was to choose a minimal screen width that our website design should support. Due to the nature of our website, very few visitors use a small screen width (smartphone or tablet). Should we invest in supporting mobile devices instead of writing more tutorials? Our final table suggests we shouldn't.
ConclusionMost Interesting Sections from Final Table
The second table is rather large but this doesn't bother us. If we inspect the cumulative percentages and scroll down, we conclude that 69.9% of all sessions used a screen width of 1,366 pixels or more. A screen width of at least 1,280 pixels was used in 91.2% of all sessions.
Finally, if we make sure our website works well at a screen width of 1,024 pixels or more, we support 95.6% of all visits. Our final decision was to support screen widths of 800 pixels or more because it's the smallest our design could reasonably handle.