Unit Wise SPSS Concepts

   

Syllabus

Statistical Package for Social Science

Credits: 4

Theory:  4 Hours

Tutorials: -

Max Marks: 100

External: 75 Marks

Internal: 25 Marks

 

Course Objectives:

1.      Understand the main features of SPSS

2.      Use the SPSS GUI effectively

3.      Perform descriptive analyses with SPSS

4.      Perform common parametric and non-parametric tests

5.      Perform simple regressions and multivariate analyses (factor and cluster)

6.      Know where to find help

SYLLABUS

UNIT I:

 

 

SPSS Environment: data editor, output viewer, syntax editor – Data view window – SPSS Syntax – Data creation – Importing data – Variable types in SPSS and Defining variables – Creating a Codebook in SPSS.

UNIT II:

 

 

Computing Variables - Recoding (Transforming) Variables: Recoding Categorical String Variables using Automatic Recode - Rank Cases - Sorting Data - Grouping or Splitting Data.

UNIT III:

 

                       

Descriptive Statistics for Continuous Variables - The Explore procedure - Frequencies Procedure – Descriptives - Compare Means - Frequencies for Categorical Data.

UNIT IV:

 

 

Statistical tests - Inferential Statistics for Association: Pearson Correlation, Chi-square Test of Independence – Inferential Statistics for Comparing Means: One Sample t Test, PairedSamples T Test, Independent Samples T Test, One-Way ANOVA.

UNIT V:

 

 

Correlation and regression - Linear correlation and regression - Multiple regression (linear)

Multivariate analysis - Factor analysis - Cluster analysis

Outcomes:

 

 

1.      Students’ familiarity with the tool box of statistical software. Capacitating students in analyzing complex information with the help of statistical software.

2.      Statistical Package for Social Sciences (SPSS). A strong theoretical and empirical foundation in statistical analysis.

Text Books:

 

 

SPSS Programming and Data Management: A Guide for SPSS and SAS Users, 3rd Edition

by Inc. Spss and Levesque Raynald

References

1. IBM 2016, IBM Knowledge Center: SPSS Statistics, IBM, viewed 18 May 2016, https://www.ibm.com/support/knowledgecenter/SSLVMB/ welcome/

2. HOW TO USE SPSS  A Step-By-Step Guide to Analysis and Interpretation, Brian C. Cronk, Tenth edition published in 2018 by Routledge.

3. SPSS for Intermediate Statistics: Use and Interpretation, Nancy L. Leech et. al., Second edition published in 2005 by Lawrence Erlbaum Associates, Inc.

 4. Using IBM SPSS statistics for research methods and social science statistics, William E. Wagner, Fifth edition published in 2015 by SAGE Publications, Inc.

Lab

Lab 1 and 2

Ø  A     Getting familiar with SPSS

Ø  B1    Entering data by hand

Ø  B2    Using “Variable View”

Ø  B3    Creating a frequency table

Ø  C     Creating a histogram

Ø  D     Creating a boxplot

Ø  E     Calculating mean, mode and median

Ø  F     Calculating measures of spread

Lab 3 and 4

Ø  Computing new variables using "Compute"

Ø  Changing the coding of a variable using "Recode"

Ø  Importing (reading) data from a text file without columns

Ø  Locating outliers using a boxplot

Ø  Selecting and deleting cases

Ø  Computing confidence interval for population means

Ø  Testing a population mean using t-test

Lab 5 and 6

Ø  Reading (importing) data from a text file with columns.

Ø  Assessing Normality of data (Q-Q Plot, Normal quantile plot).

Ø  Selecting a group case.

Ø  Visualizing difference between two groups with a double boxplot.

Ø  Testing the difference between two independent groups using t-test.

Ø  Testing difference between related samples using t-test.

Ø  Testing difference in increase between two different groups.

 

Lab 7 and 8

Ø  Entering data for a two-way table

Ø  Weighting cases with frequencies

Ø  Setting the meaning of a value/code

Ø  Creating a two-way table with all the occurrences

Ø  Choosing the most adequate two-way table form

Ø  Visualizing the counts from a two-way table

Ø  Significance tests and confidence intervals for proportions

Ø  Testing the association of two categorical variables using chi-square test

Lab 9 and 10

Ø  ANOVA

Ø  Non-parametric test: Wilcoxon Rank Sum Test




Unit-1 

Introduction to SPSS

Introduction:

SPSS is a powerful statistical software program with a graphical interface designed for ease of use.  Almost all commands and options can be accessed using pull down menus at the top of the window, and the program opens to a spreadsheet which looks similar to that of Microsoft Excel.  This design means that once you learn a few basic steps to access programs, it’s very easy to figure out how to extend your knowledge in using SPSS through the help files.

Originally named Statistical Package for the Social Sciences, later modified to Statistical Product and Service Solutions (SPSS) to reflect the diversity of the user base, the software was released in its first version in 1968 after being developed by Norman H. Nie, Dale H. Bent, and C. Hadlai Hull. Those principals incorporated as SPSS Inc. in 1975.

 

SPSS Inc announced on July 28, 2009, that it was being acquired by IBM.  Because of a dispute about ownership of the name "SPSS," between 2009 and 2010, the product was referred to as PASW (Predictive Analytics Soft Ware).  As of January 2010, it became "SPSS: An IBM Company". Complete transfer of business to IBM was done by October 1, 2010. By that date, SPSS: An IBM Company ceased to exist. IBM SPSS is now fully integrated into the IBM Corporation and is one of the brands under IBM Software Group's Business Analytics Portfolio, together with IBM Algorithmics, IBM Cognos and IBM Open Pages.

Overview:

SPSS is a Windows based program; it provides a powerful statistical analysis and data management system in a graphical environment based on the user interface facility.

This program can be used to analyze data from surveys, tests observations, etc.

It can perform a variety of data analyses and presentation functions, including statistical analysis and graphical presentation of data. Among its features some are listed below Descriptive statistics such as frequencies, central tendency, plots, charts, and lists; and sophisticated inferential and multivariate statistical procedures, such as analysis of variance (ANOVA), factor analysis, cluster analysis, and categorical data analysis.

SPSS Windows and Menus:

SPSS for Windows consists of five different windows, each of which is associated with a particular SPSS file type. This document discusses the two windows most frequently used in Analyzing data in SPSS, the Data Editor and the Output Viewer windows. In addition, the Syntax Editor and the use of SPSS command syntax.


Data Editor Window:

The Data Editor is the window that is open at start-up and is used to enter and store data in a spreadsheet format. The SPSS Data Editor opens with the window looking as the picture displayed below:


 

a) Data View

The Data Editor window displays the contents of the working dataset. It is arranged in a spreadsheet format that contains variables in columns and cases in rows. There are two sheets in the window. The Data View is the sheet that is visible when you first open the Data Editor and contains the data. You can access the second sheet by clicking on the tab labelled Variable View and while the second sheet is similar in appearance to the first, it is does not actually contain data. Instead, this second sheet contains information about the dataset that is stored with the dataset. Unlike most spreadsheets, the Data Editor can only have one dataset open at a time. However, in the latest version you can open multiple Data Editors at one time, each of which contains a separate dataset. Datasets that are currently open are called working datasets and all data manipulations, statistical functions, and other SPSS procedures operate on these datasets. The Data Editor contains several menu items that are useful for performing various operations on your data.


a) Variable View

Many of the cells in the spreadsheet contain hidden dialog boxes that can be activated by clicking on a cell. If you see a gray box appear on the right side of the cell when you first click on the cell, this indicates that there is a hidden dialog box which can be accessed by clicking on that box. The Name column allows you to give the name for the variable. The variable name should be unique. It must start with a letter (A-Z or a-z), the rest can be number and underscore. Space, +,*,/ and – are not allowed. The Variable Type dialog box allows you to define the type of data for variables.

 


Data Editor / Variable View Menus:

File. Use the File menu to create a new SPSS file, open an existing file, or read in spreadsheet or database files created by other software programs (e.g., Excel).

Edit. Use the Edit menu to modify or copy data and output files.

View. Choose which buttons are available in the window or how the window should look.

Data. Use the Data menu to make changes to SPSS data files, such as merging files, transposing variables, or creating subsets of cases for subset analysis.

Transform. Use the Transform menu to make changes to selected variables in the data file (e.g., to recode a variable) and to compute new variables based on existing variables.

Analyze. Use the Analyze menu to select the various statistical procedures you want to use, such as descriptive statistics, cross-tabulation, hypothesis testing and regression analysis.

Graphs. Use the Graphs menu to display the data using bar charts, histograms, scatterplots, boxplots, or other graphical displays. All graphs can be customized with the Chart Editor.

Utilities. Use the Utilities menu to view variable labels for each variable.

Add-ons. Information about other SPSS software.

Window. Choose which window you want to view.

Help. Index of help topics, tutorials, SPSS home page, Statistics coach, and version of SPSS.

Viewer Menu: Menu is similar to Data Editor menu, but has two additional options:

Insert. Use the insert menu to edit your output

Format. Use the format menu to change the format of your output.

Output Viewer:

The Output Viewer opens automatically when you execute an analysis or create a graph using a dialog box or command syntax to execute a procedure. Statistical results and graphs are displayed in the Viewer window. The (output) Viewer window is divided into two panes. The right-hand pane contains the all the output and the left-hand pane contains a tree-structure of the results. You can use the left-hand pane for navigating through, editing and printing your results.  


Syntax Editor

The Syntax Editor is used to create SPSS command syntax for using the SPSS production facility. Usually, you will be using the point and click facilities of SPSS, and hence, you will not need to use the Syntax Editor. More information about the Syntax Editor and using the SPSS syntax is given in the SPSS Help Tutorials under Working with Syntax. A few instructions to get you started are given later in the handout in the section Running SPSS using the Syntax Editor (or Command Language).

File Types

Data Files. A file with an extension of .sav is assumed to be a data file in SPSS for Windows format. A file with an extension of .por is a portable SPSS data file. The contents of a data file are displayed in the Data Editor window.

Viewer (Output) Files. A file with an extension of .spo is assumed to be a Viewer file containing statistical results and graphs.

Syntax (Command) Files. A file with an extension of .sps is assumed to be a Syntax file containing spss syntax and commands.






Variable Types in SPSS

After data are in the Data Editor window, there are several things that you may want to do to describe your data. Before describing the process for defining variables, an important distinction should be made between two often confused terms: variable and value. A variable is a measure (Height or Weight) or classification (Male or Female) scheme that can have several values. Values are the numbers or categorical classification representing individual instances of the variable being measured.  For example, a variable could be created for Sports status. Each individual in the dataset would be assigned a value representing their Sports_Type classification. For instance, we could assign the value 1, Kho-Kho the value 2, Running, the value 3, Kabadi, the value 4, Volley Ball  and Hockey the value 5.


 Defining Variables.

The default name for new variables is the prefix var and sequential 

five-digit number (e.g., var00001, var00002, var00003). To change the 

name, format and other attributes of a variable.

1. Double click on the variable name at the top of a column or,

2. Click on the Variable View tab at the bottom of Data Editor Window.

3. Edit the variable name under column labelled Name. The variable 

name must be eight characters or less in length. You can also specify 

the number of decimal places (under Decimals), assign a descriptive 

name (under Label), define missing values (under Missing), define the 

type of variable (under Measure; e.g., scale, ordinal, nominal), and 

define the values for nominal variables (under Values).

After the data is entered (or several times during data entering), you will 

want to save it as an SPSS save file. See the section on Saving Data As 

An SPSS Save File.

Codebook

A codebook is a document containing information about each of the 

variables in the data set, such as

Ø The name assigned to the variable

Ø What the variable represents (i.e. label)

Ø How the variables was measured (i.e., nominal, ordinal,  etc)

Ø How the variable was actually recorded in the raw data (i.e. numeric, string, how many characteristics, how many decimal places etc..)

Ø For scale variables: the variables units of measurement

Ø For categorical variable. If coded numerically, the numeric codes and what they represent

Code book also contain documentation about when and how data was 

created. A good code book allows you to communicate your research 

data to others clearly and ensure that the data is understood and 

interpreted properly.

Many codebooks are created manually, however in SPSS, its possible to 

generate a codebook from an existing SPSS data file.

Creation of Codebook

·        Open the SPSS data file

·        Click File          Display Data File Information        →   Working File

·        The codebook will print to the output viewer window

·        Codebook is ready.




UNIT-2

Creating a New Variable

To create a new variable:

1. Display the Data Editor window (i.e., execute the following commands 

while in the Data Editor window displaying the data file you want to use 

to create a new variable).

2. Choose Transform on the menu bar

3. Choose Compute...

4. Enter the new variable name in the Target Variable box.

5. Enter the definition of the new variable in the Numeric Expression box (e.g., SQRT (visan), LN (age), or MEAN (age)) or

6. Select variable(s) and combine with desired arithmetic operations and/or functions.

7. Choose OK


After creating a new variable(s), you will probably want to save the new 

variable(s) by re-saving your data using the Save command under File 

on the menu bar (See Saving Data as an SPSS Save File). 

Example: Creating a (New) Transformed Variable (Per capita Income values to log values)

You can use the SPSS commands for creating a new variable to create a transformed variable. Suppose you have a variable indicating Per capita Income in the data set, and you want to transform this variable using the natural logarithm to make the distribution less skewed (i.e., you want to create a new variable which is natural logarithm of Per capita Income). 


Now, a new variable, Per capita Income which is the natural logarithm of per capita income, will be added to your data set. 

Recoding or Combining Categories of a Variable

To recode or combine categories of a variable:

1. Display the Data Editor window (i.e., execute the following commands while in the Data Editor window displaying the data file you want to use to recode variables). 

2. Choose Transform on the menu bar

3. Choose Recode

4. Choose Into Same Variable... or Into Different Variable...

5. Select a variable to recode from the variable list on the left and then click on the arrow located in the middle of the window. This defines the input variable.

6. If recoding into a different variable, enter the new variable name in the box under Name:, then choose Change. This defines the output variable.

7. Choose Old and New Values...

8. Choose Value or Range under Old Value and enter old value(s).

9. Choose New Value and enter new value, then choose Add.

10. Repeat the process until all old values have been redefined.

11. Choose Continue

12. Choose OK

After creating a new variable(s), you will probably want to save the new variable(s) by re-saving your data using the Save command under File box on the menu bar (See Saving Data as an SPSS Save File).



Example: Recoding a Categorical Variable

One can use the commands for recoding a variable to change the coding values of a categorical variable. Suppose one may want to change a coding value for a particular category to modify which category SPSS uses as the referent category in a statistical procedure. For example, suppose you want to perform linear regression using the ANOVA (or General Linear Model) commands, and one of your independent variables is smoking status, smoke, that is coded 1 for never smoked, 2 for former smoker and 3 for current smoker. By default, SPSS will use current smoker as the referent category because current smoker has the largest numerical (code) value. If you want never smoked to be the referent category you need to recode the value for never smoked to a value larger than 3.

Although you can recode the smoking status into the same variable, it is 

better to recode the variable into a new/different variable, new smoke, 

so you do not lose your original data if you make an error while recoding.

 



Example: Creating a Categorical Variable from a Numerical Variable

You can use the commands for recoding a variable to create a 

categorical variable from a numerical variable (i.e., group values of the 

numerical variable into categories). For example, suppose you have a 

variable that is the number of pack years smoked, packyrs, and you 

want to create a categorical variable with the four categories, 0, >0 to 

10, >10 to 30, and >30 pack years smoked.


Automatic Recode

To use of Automatic recode:

1. Display the Data Editor window (i.e., execute the following commands while in the Data Editor window displaying the data file you want to use to recode variables).

2. Choose Transform on the menu bar

3. Choose Automatic Recode

4. Choose Into Same Variable... or Into Different Variable...

5. Choose Recoding Starting from – Lowest value or Highest value

6. If you want to apply the same scheme for all the remaining variables choose the same coding scheme for all variables, …… and remaining as per our requirement.


Rank Cases

Ranking cases in SPSS involves assigning a rank to each case based on the values of a specific variable. This can be useful for analyzing relative positions within your dataset.

  • Go to Transform Menu
  • Choose the Rank Cases from the drop-down menu, select rank cases……
  • Select the variables to Rank – select variable in the list – highlight the variable we need to give rank and move them to the variable to Rank box by clicking arrow button.
  • Choose Rank type and options. In the Rank cases dialogue box, we have selected rank type

        Rank Type – Ascending – if you need Smallest to largest

                   -     Descending – if you need largest values to have largest                                                      rank

Ties      - We choose average, minimum or sequential     ranking

Fractional – We can choose whether ranks should be fractional i.e. when there are ties in ranks

  • Create a new variable for the rank – Rank Cases – Rank Column – type of new variable name for each rank you create.
  •  Click Ok


Sorting Data

Sorting cases in SPSS is a way to arrange your data based on specific variable in either ascending or descending order. Here are given steps

  • Open data menu
  • Choose Sort cases from the drop-down menu and select sort cases
  • Select the variable to sort from the list of variables in the data set
  • Choose the sort order ie. Ascending, descending
  • Click ok.


    Grouping and Splitting data

Grouping Data

Using the "Select Cases" Command:

  • Criteria: Define conditions based on variable values.
  • Example: Select cases where "Age" is greater than 30 and "Gender" is "Male."
  • Steps:

1.  Go to Data > Select Cases.

2.  Choose "If condition is satisfied."

3.  Enter your condition in the syntax box.

4.  Click OK.

Splitting Data

1. Using the "Split File" Command:

  • Grouping Variable: Specify a variable to split the file based on its values.
  • Example: Split the file into two groups based on "Gender."
  • Steps:

1.  Go to Data > Split File.

2.  Choose "Organize output by groups."

3.  Select the grouping variable.

4.  Click OK.



UNIT-3

Descriptive Statistics

A common first step in data analysis is to summarize information about variables in any dataset, such as the averages and variances of variables. SPSS descriptive statistics are designed to give us information about the distributions of our study variables. SPSS allows us to complete a number of statistical procedures including measures of central tendency, measures of variability around the mean, measures of deviation from normality, and information concerning the spread of the distribution. Several summary or descriptive statistics are available under the Descriptives option available from the Analyze and Descriptive Statistics menus:

Kinds of descriptive statistics that SPSS provides

Measures of Central Tendency:

A measure of central tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data. As such, measures of central tendency are sometimes called measures of central location. They are also classed as summary statistics. The mean (often called the average) is most likely the measure of central tendency that you are most familiar with, but there are others, such as the median and the mode.

The mean, median and mode are all valid measures of central tendency, but under different conditions, some measures of central tendency become more appropriate to use than others.

Mean (Average)

The mean (or average) is the most popular and well-known measure of central tendency. It can be used with both discrete and continuous data, although its use is most often with continuous data or interval data.

Median

Median is the value which occupies the middle position when all the observations are arranged in an ascending/descending order. It divides the frequency distribution exactly into two halves. Fifty percent of observations in a distribution have scores at or below the median. Hence median is the 50th percentile. Median is also known as ‘positional average’.

 

Mode

Mode is defined as the value that occurs most frequently in the data. Some data sets do not have a mode because each value occurs only once. On the other hand, some data sets can have more than one mode. This happens when the data set has two or more values of equal frequency which is greater than that of any other value. Mode is rarely used as a summary statistic except to describe a bimodal distribution. In a bimodal distribution, the taller peak is called the major mode and the shorter one is the minor mode.

Measures of Dispersion (Variability):

Measures of dispersion express quantitatively the degree of variation or dispersion of values in a population or in a sample. Along with measures of central tendency, measures of dispersion are widely used in practice as descriptive statistics. Some measures of dispersion are the standard deviation, the mean deviation, the range, the interquartile range.

From this discussion we now focus our attention on the scatter or variability which is known as dispersion. Let us take the following three sets of data.

Students

Group A

Group B

Group C

1

50

45

30

2

50

50

45

3

50

55

75

Mean

50

50

50

Thus, the three groups have same mean i.e. 50. In fact, the median of group A and B are also equal. Now if one would say that the students from the three groups are of equal capabilities, it is totally a wrong conclusion then. Close examination reveals that in group A students have equal marks as the mean, students from group B are very close to the mean but in the third group C, the marks are widely scattered. It is thus clear that the measures of the central tendency is alone not sufficient to describe the data.

Measure for the size of the distribution:

    • Maximum: largest value in the distribution
    • Minimum: smallest value in the distribution
    • Range of values in the distribution
    • Sum of the scores in the distribution

Measures of stability: Standard error

    • Standard error is designed to be a measure of stability or of sampling error.
    • SPSS computes SE for the mean, the kurtosis, and the skewness
    • A small value indicates a greater stability or smaller sampling error

Measures of the shape of the distribution

Skewness: The terms “skewed” and “askew” are used to refer to something that is out of line or distorted on one side.  When referring to the shape of frequency or probability distributions, “skewness” refers to asymmetry of the distribution. The extent to which a distribution of values deviates from symmetry around the mean. A value of skewness is zero means the distribution is symmetric, while a positive skewness indicates a greater number of smaller values, and a negative value indicates a greater number of larger values. This statistic ranges from -1 to +1. A kurtosis value of ±1 is considered very good for most social research uses, but ±2 is also usually acceptable.

Kurtosis: a measure of the "peakedness" or "flatness" of a distribution. A kurtosis value near zero indicates a shape close to normal. A negative value indicates a distribution which is more peaked than normal, and a positive kurtosis indicates a shape flatter than normal. An extreme positive kurtosis indicates a distribution where more of the values are located in the tails of the distribution rather than around the mean. A kurtosis value of +/-1 is considered very good for most social science research uses, but +/-2 is also usually acceptable.

Using SPSS for Descriptive Statistics

For most computations, you should find SPSS to be easier than Excel.  In order to find the two most predominant measures of central tendency (the mean and median) we start in the Analyze menu. Within that menu, choose Frequencies and Descriptive Statistics as shown:

Frequencies

While the descriptive statistics procedure is useful for summarizing data with an underlying continuous distribution, the Descriptives procedure will not prove helpful for interpreting categorical data. Instead, Frequencies option is more useful to investigate the number of cases that fall into various categories. The Frequencies option allows you to obtain the number of people within each education level in the dataset. The Frequencies procedure is found under the Analyze menu:

Frequency Tables for Categorical Variables. To produce frequency tables and bar charts for categorical variables:

1. Choose Analyze from the menu bar

2. Choose Descriptive Statistics

3. Choose Frequencies

4. Variable(s): To select the variables you want from the source list on the left, highlight a variable by pointing and clicking the mouse and then click on the arrow located in the middle of the window. Repeat the process until you have selected all the variables you want.

5. Choose Continue

6. Choose OK

Contingency Tables for Categorical Variables. To produce contingency tables for categorical variables:

1. Choose Analyze from the menu bar.

2. Choose Descriptive Statistics

3. Choose Crosstabs...

4. Row(s): Select the row variable you want from the source list on the left and then click on the arrow located next to the Row(s) box. Repeat the process until you have selected all the row variables you want.

5. Column(s): Select the column variable you want from the source list on the left and then click on the arrow located next to the Column(s) box. Repeat the process until you have selected all the column variables you want.

6. Choose Cells...

7. Choose the cell values (e.g., observed counts; row, column, and margin (total) percentages). Note the option is selected when the little box is not empty.

8. Choose Continue

9. Choose OK

Descriptive Statistics for Numerical Variables. To produce descriptive statistics for numerical variables:

1. Choose Analyze on the menu bar

2. Choose Descriptive Statistics

3. Choose Frequencies...

4. Variable(s): To select the variables you want from the source list on the left, highlight a variable by pointing and clicking the mouse and then click on the arrow located in the middle of the window. Repeat the process until you have selected all the variables you want.

5. Choose Display frequency tables to turn off the option. Note that the option is turned off when the little box is empty.

6. Choose Statistics

7. Choose summary measures (e.g., mean, median, standard deviation, minimum, maximum, skewness or kurtosis).

8. Choose Continue

9. Choose OK

Graphing Your Data

One can produce very fancy figures and graphs in SPSS.

Bar Charts

The easiest way to produce simple bar charts is to use the Bar Chart option with the Frequencies... command. In Frequency Tables (& Bar Charts) for Categorical Variables. One can only produce only one bar chart at a time using the Bar command.

1. Choose Graphs (& then Legacy Dialogs, if Version 15) from the menu bar.

2. Choose Bar...

3. Choose Simple, Clustered, or Stacked

4. Choose what the data in the bar chart represent (e.g., summaries for groups of cases).

5. Choose Define

6. Select a variable from the variable list on the left and the click on the arrow next to the Category axis.

7. Choose what the bars represent (e.g., number of cases or percentage of cases)

8. Choose OK

Histograms

The easiest way to produce simple histograms is to use the Histogram option with the Frequencies... command. One can produce only one histogram at a time using the Histogram command.

1. Choose Graphs (& then Legacy Dialogs, if Version 15) from the menu bar

2. Choose Histogram...

3. Select a variable from the variable list on the left and then click on the arrow in the middle of the window.

4. Choose Display normal Curve if you want a normal curve superimposed on the histogram.

5. Choose OK

Boxplots

The easiest way to produce simple boxplots is to use the Boxplot option with the Explore... command. One can produce only one boxplot at a time using the Boxplot command.

 

1. Choose Graphs (& then Legacy Dialogs, if Version 15) from the menu bar.

2. Choose Boxplot...

3. Choose Simple or Clustered

4. Choose what the data in the boxplots represent (e.g., summaries for groups of cases).

5. Choose Define

6. Select a variable from the variable list on the left and then click on the arrow next to the Variable box.

7. Select the variable from the variable list that defines the groups and then click on the arrow next to Category Axis.

8. Choose OK

Normal Probability Plots. To produce Normal probability plots:

1. Choose Graphs from the menu bar.

2. Choose Q-Q... to get a plot of the quantiles (Q-Q plot) or choose P-P... to get a plot of the cumulative proportions (P-P plot)

3. Select the variables from the source list on the left and then click on the arrow located in the middle of the window.

4. Choose Normal as the Test Distribution. The Normal distribution is the default Test Distribution. Other Test Distributions can be selected by clicking on the down arrow and clicking on the desired Test distribution.

5. Choose OK

SPSS will produce both a Normal probability plot and a detrended Normal probability plot for each selected variable. Usually the Q-Q plot is the most useful for assessing if the distribution of the variable is approximately Normal.

Scatter Plot. To produce a scatter plot between two numerical variables:

1. Choose Graphs (& then Legacy Dialogs, if Version 15) on the menu bar.

2. Choose Scatter/Dot...

3. Choose Simple

4. Choose Define

5. Y Axis: Select the y variable you want from the source list on the left and then click on the arrow next to the y axis box.

6. X Axis: Select the x variable you want from the source list on the left and then click on the arrow next to the x axis box.

7. Choose Titles...

8. Enter a title for the plot (e.g., y vs. x).

9. Choose Continue

Chi-square test

A chi-square test is a statistical test commonly used for testing independence of attributes and goodness of fit test. Pearson's chi-squared test is used to test two types of comparison: tests of goodness of fit and tests of independence.

·A test of independence assesses whether paired observations on two variables, expressed in a contingency table, are independent of each other. The test is applied when you have two categorical variables from a single population. It is used to determine whether there is a significant association between the two variables. For example, in a nutrition survey, players might be classified by Sex (male or female) and Sports_Type (Kho-Kho, Running, Kabadi, Volleyball or Hockey). We could use a chi-square test for independence to determine whether gender is related to sports type.

·         A test of goodness of fit establishes whether observed frequencies are significantly different from expected frequencies. When an analyst attempts to fit a statistical model to observed data, we have to identify how well the model actually reflects the data. How "close" are the observed values to those which would be expected under the fitted model?

Measures of Association

   The Crosstabs procedure forms two-way and multiway tables and provides a variety of tests and measures of association for two-way tables.

   If you specify a row, a column, and a layer factor (control variable), the Crosstabs procedure forms one panel of associated statistics and measures for each value of the layer factor.

The Chi Square Test of Independence

Click Analyze

        Descriptive Statistics

                Crosstabs...

A new window pops out. Select one or more row variables and one or more column variables. The Chi Square Statistic (Pearson chi-square) is used to test the hypothesis that the row and column variables are independent.


Select the two variables Sex and Sports_Type will be shown on the list on the left. It does not matter which variables we select as rows and which as column. For illustration purpose, we select Sex as Row(s) and Sports_Type as Column(s). Now click Statistics… on the right. A new window Crosstabs: Statistics pops out.

Make sure that the Chi-square box at the top is checked. Click Continue. The window will be closed. Click Continue. The window will then be closed. Now click OK in the original window.


In the third table, since the Pearson Chi-Square value is 28.514, the p-value is .000, we can reject the null hypothesis and conclude that there is a relationship between Sex and Sports_Type at 1% level of significant.

The t test is one type of inferential statistics. It is used to determine whether there is a significant difference between the means of two groups. With all inferential statistics, we assume the dependent variable fits a normal distribution. When the difference between two population averages is being investigated, a t test is used. In other words, a t test is used when we wish to compare two means (the scores must be measured on an interval or ratio scale).

 

 One sample t-test

A one sample t-test allows us to test whether a sample mean (of a normally distributed interval variable) significantly differs from a hypothesized value.  For example, using the Home Science Data file, say we wish to test whether the average protein differs significantly from 80. 

 

The One-Sample T Test procedure tests whether the mean of a single variable differs from a specified constant.

To test the values of a quantitative variable against a hypothesized test value, choose a quantitative variable and enter a hypothesized test value.

Assumptions. This test assumes that the data are normally distributed; however, this test is fairly robust to departures from normality.

 

To Obtain a One-Sample T Test: From the menus choose


Select the variable(s) that you want to test by clicking on it in the left-hand pane of the One-Sample t Test dialog box. Then click on the arrow button to move the variable into the Test Variable(s) pane. In this example, move the Protein variable into the Test Variables box: Click in the Test Value box and enter the value that you will compare to.



Enter a numeric value 80 in the Test Value box against which each sample mean is compared and click OK to see the output.

Output of one-sample t-test:


This output tells us that we have 82 observations (N), the mean number of Protein is 78.41 and the standard deviation of the number of Protein is 14.623.

The mean of the variable protein for this particular sample of players is 78.41, which is statistically not different from the test value of 80 (p-value is greater than 0.05, we accept our null hypothesis, i.e there is no significant difference between sample mean and hypothesized mean).

Independent-Samples T Test (Two independent samples t-test)

The Independent-Samples T Test procedure compares means for two groups of cases. Ideally, for this test, the subjects should be randomly assigned to two groups, so that any difference in response is due to the treatment (or lack of treatment) and not to other factors.

Assumptions

For the equal-variance t test, the observations should be independent, random samples from normal distributions with the same population variance.

For the unequal-variance t test, the observations should be independent, random samples from normal distributions.

From the menus choose

Analyze

Compare Means

Independent-Samples T Test

 It produces the following dialog box:



To select variables for the analysis, first highlight them by clicking on them in the box on the left. Then move them into the appropriate box on the right by clicking on the arrow button in the center of the box. Your independent variable should go in the Grouping Variable box, which is a variable that defines which groups are being compared. For example, because sports types are being compared in this analysis, the Sports_Type variable is selected. However, because Sports_Type has more than two levels, you will need to click on Define Groups to specify the two levels of Sports_Type that you want to compare. This will produce another dialog box as is shown below:


Here, the groups to be compared are limited to the groups with the values 1 and 2, which represent the Kho-Kho and rnning groups. After selecting the groups to be compared, click the Continue button, and then click the OK button in the main dialog box.

 

The above choices will produce the following output:


The columns labeled "Levene's Test for Equality of Variances" tell us whether an assumption of the t-test has been met. Look at the column labeled "Sig." under the heading "Levene's Test for Equality of Variances". In this example, the significance (p value) of Levene's test is .776. If this value is less than or equal to your α level for this test, then you can reject the null hypothesis that the variances of the two groups are equal, implying that the variances are unequal. In this example, .776 is larger than our α level of .05, so we will assume that the variances are equal and we will use the middle row of the output.

The column labeled "t" gives the observed or calculated t value. In this example, assuming equal variances, the t value is 2.066. (We can ignore the sign of t when using a two-tailed t-test.) The column labeled "df" gives the degrees of freedom associated with the t test. In this example, there are 43 degrees of freedom.

The column labeled "Sig. (2-tailed)" gives the two-tailed p value associated with the test. In this example, the p value is .045.  This value is less than or equal to 0.05 level for this test, then we can reject the null hypothesis that the means of the two groups are not equal (i.e. p<0.05).

 Paired-Samples T Test

The Paired-Samples T Test compares the means between two related groups on the same continuous, dependent variable. For example, you could use a dependent t-test to understand whether there was a difference in smokers' daily cigarette consumption before and after a 8 week hypnotherapy programme (i.e., your dependent variable would be "daily cigarette consumption", and your two related groups would be the cigarette consumption values "before" and "after" the hypnotherapy programme). A paired sample t-test is used to determine whether there is a significant difference between the average values of the before and after measurement for a single group.

To obtain a paired-samples t test:

Analyze

 Compare Means

   Paired-Samples T Test

A new window pops out. Drag the variable "Before" and "After" from the list on the left to the pair 1 variable 1 and variable 2 respectively, as shown below.



Clicking the OK button with the above variables selected will produce output for the paired-samples t test. The following output is an example of the statistics you would obtain from the above example.

Paired Samples Statistics

 

Mean

N

Std. Deviation

Std. Error Mean

Pair 1

Before

85.3333

15

7.04746

1.81965

After

74.0000

15

6.40312

1.65328

 

Paired Samples Correlations

 

N

Correlation

Sig.

Pair 1

Before & After

15

.594

.020



First table gives the descriptive statistics for each of the two groups (as defined by the pair of variables.).

Second table shows that there are 15 pairs of observations (N). The correlation between the two variables is given in the third column. In this example r = 0.594. The last column gives the p value for the correlation coefficient. As always, if the p value is less than or equal to the alpha level (0.05), then we can reject the null hypothesis that the population correlation coefficient (ρ) is equal to 0. In this case, p = .000, so we reject the null hypothesis and conclude that the fitness program is more effective at 1% significant level (i.e. p<0.01).


UNIT-5

Correlation

Correlation is one of the most common forms of data analysis both because it can provide an analysis that stands on its own, and also because it underlies many other analyses, and can can be a good way to support conclusions after primary analyses have been completed. Correlations are a measure of the linear relationship between two variables. A correlation coefficient has a value ranging from -1 to 1. Values that are closer to the absolute value of 1 indicate that there is a strong relationship between the variables being correlated whereas values closer to 0 indicate that there is little or no linear relationship. The sign of a correlation coefficient describes the type of relationship between the variables being correlated. A positive correlation coefficient indicates that there is a positive linear relationship between the variables: as one variable increases in value, so does the other. An example of two variables that are likely to be positively correlated are the number of days a student attended class and test grades because, as the number of classes attended increases in value, so do test grades. A negative value indicates a negative linear relationship between variables: as one variable increases in value, the other variable decreases in value. The number of days students miss class and their test scores are likely to be negatively correlated because as the number of days of missed classed increases, test scores typically decrease.

To obtain a correlation in SPSS, start at the Analyze menu. Select the Correlate option from this menu. By selecting this menu item, you will see that there are three options for correlating variables: (1) Bivariate, (2) Partial, and (3) Distances. This document will cover the first two types of correlations. The bivariate correlation is for situations where you are interested only in the relationship between two variables. Partial correlations should be used when you are measuring the association between two variables but want to factor out the effect of one or more other variables.

To obtain a bivariate correlation, choose the following menu option:

    Analyze
        Correlate
            Bivariate...

This will produce the following dialog box:



To obtain correlations, first click on the variable names in the variable list on the left side of the dialog box. Next, click on the arrow between the two white boxes which will move the selected variables into the Variables box. Each variable listed in the Variables box will be correlated with every other variable in the box. For example, with the above selections, we would obtain correlations between Education Level and Current Salary, between Education Level and Previous Experience, and between Current Salary and Previous Experience. We will maintain the default options shown in the above dialog box in this example. The first option to consider is the type of correlation coefficient. Pearson's is appropriate for continuous data as noted in the above example, whereas the other two correlation coefficients, Kendall's tau-b and Spearman's, are designed for ranked data. The choice between a one and two-tailed significance test in the Test of Significance box should be determined by whether the hypothesis you are testing is making a prediction about the direction of effect between the two variables: if you are making a prediction that there is a negative or positive relationship between the variables, then the one-tailed test is appropriate; if you are not making a directional prediction, you should use the two-tailed test if there is not a specific prediction about the direction of the relationship between the variables you are correlating. The selections in the above dialog box will produce the following output:


is output gives us a correlation matrix for the three correlations requested in the above dialog box. Note that despite there being nine cells in the above matrix, there are only three correlation coefficients of interest: (1) the correlation between current salary and educational level, the correlation between previous experience and educational level, and the correlation between current salary and previous experience. The reason only three of the nine correlations are of interest is because the diagonal consists of correlations of each variable with itself, always resulting in a value of 1.00 and the values on each side of the diagonal replicate the values on the opposite side of the diagonal. For example, the three unique correlation coefficients show there is a positive correlation between employees' number of years of education and their current salary. This positive correlation coefficient (.661) indicates that there is a statistically significant (p < .001) linear relationship between these two variables such that the more education a person has, the larger that person's salary is. Also observe that there is a statistically significant (p < .001) negative correlation coefficient (-.252) for the association between education level and previous experience, indicating that the linear relationship between these two variables is one in which the values of one variable decrease as the other increases. The third correlation coefficient (-.097) also indicates a negative association between employee's current salaries and their previous work experience, although this correlation is fairly weak.

Regression

Regression is a technique that can be used to investigate the effect of one or more predictor variables on an outcome variable. Regression allows you to make statements about how well one or more independent variables will predict the value of a dependent variable. For example, if you were interested in investigating which variables in the employee database were good predictors of employees' current salaries, you could create a regression equation that would use several of the variables in the dataset to predict employees' salaries. By doing this you will be able to make statements about whether knowing something about variables such as employees' number of years of education, their starting salary, or their number of months on the job are good predictors of their current salaries.

To conduct a regression analysis, select the following from the Analyze menu:

    Analyze
         Regression
                Linear...

This will produce the following dialog box:


his dialog box illustrates an example regression equation. As with other analyses, you select variables from the box on the left by clicking on them, then moving them to the boxes on the right by clicking the arrow next to the box where you want to enter a particular variable. Here, employees' current salary has been entered as the dependent variable. In the Independent(s) box, several predictor variables have been entered, including education level, beginning salary, months since hire, and previous experience.













Comments