Unit Wise SPSS Concepts
Syllabus
Statistical Package for Social Science
|
Credits: 4 |
Theory: 4
Hours |
Tutorials: - |
|
Max Marks: 100 |
External: 75 Marks |
Internal: 25 Marks |
|
Course
Objectives: |
||
|
1. Understand the main features of
SPSS 2.
Use
the SPSS GUI effectively 3.
Perform
descriptive analyses with SPSS 4.
Perform
common parametric and non-parametric tests 5.
Perform
simple regressions and multivariate analyses (factor and cluster) 6.
Know
where to find help |
||
|
SYLLABUS |
||
|
UNIT
I: |
|
|
|
SPSS
Environment: data editor, output viewer, syntax editor – Data view window –
SPSS Syntax – Data creation – Importing data – Variable types in SPSS and
Defining variables – Creating a Codebook in SPSS. |
||
|
UNIT
II: |
|
|
|
Computing
Variables
- Recoding (Transforming) Variables: Recoding Categorical String Variables
using Automatic Recode - Rank Cases - Sorting Data - Grouping or Splitting
Data. |
||
|
UNIT
III: |
|
|
|
Descriptive
Statistics for Continuous Variables - The Explore procedure - Frequencies
Procedure – Descriptives - Compare Means - Frequencies for Categorical Data. |
||
|
UNIT
IV: |
|
|
|
Statistical
tests - Inferential Statistics for Association: Pearson Correlation,
Chi-square Test of Independence – Inferential Statistics for Comparing Means:
One Sample t Test, PairedSamples T Test, Independent Samples T Test, One-Way
ANOVA. |
||
|
UNIT
V: |
|
|
|
Correlation
and regression - Linear correlation and regression - Multiple regression
(linear) Multivariate
analysis - Factor analysis - Cluster analysis |
||
|
Outcomes: |
|
|
|
1.
Students’ familiarity with the tool box of
statistical software. Capacitating students in analyzing complex information
with the help of statistical software. 2.
Statistical Package for Social Sciences (SPSS). A
strong theoretical and empirical foundation in statistical analysis. |
||
|
Text
Books: |
|
|
SPSS Programming and Data
Management: A Guide for SPSS and SAS Users, 3rd Edition
by Inc.
Spss and Levesque Raynald |
||
|
References
1.
IBM 2016, IBM Knowledge Center: SPSS Statistics, IBM, viewed 18 May
2016, https://www.ibm.com/support/knowledgecenter/SSLVMB/ welcome/ 2. HOW TO USE SPSS A Step-By-Step Guide to Analysis and
Interpretation, Brian C. Cronk, Tenth edition published in 2018 by Routledge.
3. SPSS for Intermediate Statistics: Use and
Interpretation, Nancy L. Leech et. al., Second edition published in 2005 by
Lawrence Erlbaum Associates, Inc. 4. Using
IBM SPSS statistics for research methods and social science statistics,
William E. Wagner, Fifth edition published in 2015 by SAGE Publications, Inc. |
||
Lab
Lab 1 and 2
Ø A Getting familiar with SPSS
Ø B1 Entering data by hand
Ø B2 Using “Variable View”
Ø B3 Creating a frequency table
Ø C Creating a histogram
Ø D Creating a boxplot
Ø E Calculating mean, mode and median
Ø F Calculating measures of
spread
Lab 3 and 4
Ø Computing new variables using "Compute"
Ø Changing the coding of a variable using
"Recode"
Ø Importing (reading) data from a text file without
columns
Ø Locating outliers using a boxplot
Ø Selecting and deleting cases
Ø Computing confidence interval for population means
Ø Testing a population mean using t-test
Lab 5 and 6
Ø Reading (importing) data from a text file with
columns.
Ø Assessing Normality of data (Q-Q Plot, Normal
quantile plot).
Ø Selecting a group case.
Ø Visualizing difference between two groups with a
double boxplot.
Ø Testing the difference between two independent
groups using t-test.
Ø Testing difference between related samples
using t-test.
Ø Testing difference in increase between two different
groups.
Lab 7 and 8
Ø Entering data for a two-way table
Ø Weighting cases with frequencies
Ø Setting the meaning of a value/code
Ø Creating a two-way table with all the occurrences
Ø Choosing the most adequate two-way table form
Ø Visualizing the counts from a two-way table
Ø Significance tests and confidence intervals for
proportions
Ø Testing the association of two categorical variables
using chi-square test
Lab 9 and 10
Ø ANOVA
Ø Non-parametric test: Wilcoxon Rank Sum Test
Unit-1
Introduction
to SPSS
Introduction:
SPSS is a powerful statistical software program with a
graphical interface designed for ease of use.
Almost all commands and options can be accessed using pull down menus at
the top of the window, and the program opens to a spreadsheet which looks
similar to that of Microsoft Excel. This design means that once you learn a few
basic steps to access programs, it’s very easy to figure out how to extend your
knowledge in using SPSS through the help files.
Originally named Statistical Package for the Social Sciences, later modified to Statistical Product and Service Solutions
(SPSS) to reflect the
diversity of the user base, the software was released in its first version in
1968 after being developed by Norman H. Nie, Dale H. Bent,
and C. Hadlai Hull. Those principals incorporated as SPSS Inc. in 1975.
SPSS Inc announced
on July 28, 2009, that it was being acquired by IBM. Because of a
dispute about ownership of the name "SPSS," between 2009 and 2010,
the product was referred to as PASW (Predictive Analytics Soft Ware). As of January 2010, it became
"SPSS: An IBM Company". Complete transfer of business to IBM was done
by October 1, 2010. By that date, SPSS: An IBM Company ceased to exist. IBM
SPSS is now fully integrated into the IBM Corporation and is one of the brands
under IBM Software Group's Business Analytics Portfolio, together with IBM Algorithmics, IBM Cognos and IBM Open Pages.
Overview:
SPSS is a Windows
based program; it provides a powerful statistical analysis and data management
system in a graphical environment based on the user interface facility.
This program can
be used to analyze data from surveys, tests observations, etc.
It can perform a
variety of data analyses and presentation functions, including statistical
analysis and graphical presentation of data. Among its features some are listed
below Descriptive statistics such as frequencies, central tendency, plots,
charts, and lists; and sophisticated inferential and multivariate statistical
procedures, such as analysis of variance (ANOVA), factor analysis, cluster
analysis, and categorical data analysis.
SPSS
Windows and Menus:
SPSS for Windows consists of five different windows, each of which is associated with a particular SPSS file type. This document discusses the two windows most frequently used in Analyzing data in SPSS, the Data Editor and the Output Viewer windows. In addition, the Syntax Editor and the use of SPSS command syntax.
Data
Editor Window:
The
Data Editor is the window that is open at start-up and is used to enter and
store data in a spreadsheet format. The SPSS Data Editor opens with the window
looking as the picture displayed below:
a) Data View
The
Data Editor window displays the contents of the working dataset. It is arranged
in a spreadsheet format that contains variables in columns and cases in rows.
There are two sheets in the window. The Data View is the sheet that is visible
when you first open the Data Editor and contains the data. You can access the
second sheet by clicking on the tab labelled Variable View and while the second
sheet is similar in appearance to the first, it is does not actually contain
data. Instead, this second sheet contains information about the dataset that is
stored with the dataset. Unlike most spreadsheets, the Data Editor can only
have one dataset open at a time. However, in the latest version you can open
multiple Data Editors at one time, each of which contains a separate dataset.
Datasets that are currently open are called working datasets and all data
manipulations, statistical functions, and other SPSS procedures operate on
these datasets. The Data Editor contains several menu items that are useful for
performing various operations on your data.
a) Variable View
Many of the cells in the spreadsheet contain hidden dialog boxes that can be activated by clicking on a cell. If you see a gray box appear on the right side of the cell when you first click on the cell, this indicates that there is a hidden dialog box which can be accessed by clicking on that box. The Name column allows you to give the name for the variable. The variable name should be unique. It must start with a letter (A-Z or a-z), the rest can be number and underscore. Space, +,*,/ and – are not allowed. The Variable Type dialog box allows you to define the type of data for variables.
Data
Editor / Variable View Menus:
File.
Use the File menu to create a new SPSS file, open an existing file, or read in
spreadsheet or database files created by other software programs (e.g., Excel).
Edit.
Use
the Edit menu to modify or copy data and output files.
View.
Choose which buttons are available in the window or how the window should look.
Data.
Use the Data menu to make changes to SPSS data files, such as merging files,
transposing variables, or creating subsets of cases for subset analysis.
Transform.
Use the Transform menu to make changes to selected variables in the data file
(e.g., to recode a variable) and to compute new variables based on existing
variables.
Analyze.
Use the Analyze menu to select the various statistical procedures you want to
use, such as descriptive statistics, cross-tabulation, hypothesis testing and
regression analysis.
Graphs.
Use the Graphs menu to display the data using bar charts, histograms,
scatterplots, boxplots, or other graphical displays. All graphs can be
customized with the Chart Editor.
Utilities.
Use the Utilities menu to view variable labels for each variable.
Add-ons.
Information about other SPSS software.
Window.
Choose which window you want to view.
Help.
Index of help topics, tutorials, SPSS home page, Statistics coach, and version
of SPSS.
Viewer
Menu:
Menu is similar to Data Editor menu, but has two additional options:
Insert. Use the insert
menu to edit your output
Format.
Use the format menu to change the format of your output.
Output
Viewer:
The
Output Viewer opens automatically when you execute an analysis or create a
graph using a dialog box or command syntax to execute a procedure. Statistical
results and graphs are displayed in the Viewer window. The (output) Viewer
window is divided into two panes. The right-hand pane contains the all the
output and the left-hand pane contains a tree-structure of the results. You can
use the left-hand pane for navigating through, editing and printing your
results.
Syntax
Editor
The
Syntax Editor is used to create SPSS command syntax for using the SPSS
production facility. Usually, you will be using the point and click facilities
of SPSS, and hence, you will not need to use the Syntax Editor. More
information about the Syntax Editor and using the SPSS syntax is given in the
SPSS Help Tutorials under Working with Syntax. A few instructions to get you
started are given later in the handout in the section Running SPSS using the
Syntax Editor (or Command Language).
File Types
Data
Files. A file with an extension of .sav is assumed
to be a data file in SPSS for Windows format. A file with an extension of .por
is a portable SPSS data file. The contents of a data file are displayed in the
Data Editor window.
Viewer
(Output) Files. A file with an extension of .spo is
assumed to be a Viewer file containing statistical results and graphs.
Syntax
(Command) Files. A file with an extension of .sps is
assumed to be a Syntax file containing spss syntax and commands.
Variable
Types in SPSS
After data are in the Data Editor window, there are several things that you may want to do to describe your data. Before describing the process for defining variables, an important distinction should be made between two often confused terms: variable and value. A variable is a measure (Height or Weight) or classification (Male or Female) scheme that can have several values. Values are the numbers or categorical classification representing individual instances of the variable being measured. For example, a variable could be created for Sports status. Each individual in the dataset would be assigned a value representing their Sports_Type classification. For instance, we could assign the value 1, Kho-Kho the value 2, Running, the value 3, Kabadi, the value 4, Volley Ball and Hockey the value 5.
The default name for new variables is the prefix var and sequential
five-digit number (e.g., var00001, var00002, var00003). To change the
name, format and other
attributes of a variable.
1. Double click on the
variable name at the top of a column or,
2. Click on the
Variable View tab at the bottom of Data Editor Window.
3. Edit the variable name under column labelled Name. The variable
name must be eight characters or less in length. You can also specify
the number of decimal places (under Decimals), assign a descriptive
name (under Label), define missing values (under Missing), define the
type of variable (under Measure; e.g., scale, ordinal, nominal), and
define the values for nominal variables (under Values).
After the data is entered (or several times during data entering), you will
want to save it as an SPSS save file. See the section on Saving Data As
An SPSS Save File.
Codebook
A codebook is a document containing information about each of the
variables in the data set,
such as
Ø The
name assigned to the variable
Ø What
the variable represents (i.e. label)
Ø How
the variables was measured (i.e., nominal, ordinal, etc)
Ø How
the variable was actually recorded in the raw data (i.e. numeric, string, how
many characteristics, how many decimal places etc..)
Ø For
scale variables: the variables units of measurement
Ø For
categorical variable. If coded numerically, the numeric codes and what they
represent
Code book also contain documentation about when and how data was
created. A good code book allows you to communicate your research
data to others clearly and ensure that the data is understood and
interpreted properly.
Many codebooks are created manually, however in SPSS, its possible to
generate a codebook from an
existing SPSS data file.
Creation of Codebook
·
Open the SPSS data file
·
→Click
File Display Data File
Information → Working File
·
The codebook will print to the output
viewer window
·
Codebook is ready.
UNIT-2
Creating a New Variable
To create a new
variable:
1. Display the Data Editor window (i.e., execute the following commands
while in the Data Editor window displaying the data file you want to use
to create a new variable).
2. Choose Transform on
the menu bar
3. Choose Compute...
4. Enter the new
variable name in the Target Variable box.
5. Enter the definition
of the new variable in the Numeric Expression box (e.g., SQRT (visan), LN (age),
or MEAN (age)) or
6. Select variable(s)
and combine with desired arithmetic operations and/or functions.
7. Choose OK
After creating a new variable(s), you will probably want to save the new
variable(s) by re-saving your data using the Save command under File
on the menu bar (See Saving Data as an SPSS Save File).
Example: Creating a (New) Transformed
Variable (Per capita Income values to log values)
You
can use the SPSS commands for creating a new variable to create a transformed
variable. Suppose you have a variable indicating Per capita Income in the data
set, and you want to transform this variable using the natural logarithm to
make the distribution less skewed (i.e., you want to create a new variable
which is natural logarithm of Per capita Income).
Now,
a new variable, Per capita Income which is the natural logarithm of per capita
income, will be added to your data set.
Recoding
or Combining Categories of a Variable
To
recode or combine categories of a variable:
1.
Display the Data Editor window (i.e., execute the following commands while in
the Data Editor window displaying the data file you want to use to recode
variables).
2.
Choose Transform on the menu bar
3.
Choose Recode
4.
Choose Into Same Variable... or Into Different Variable...
5.
Select a variable to recode from the variable list on the left and then click
on the arrow located in the middle of the window. This defines the input
variable.
6.
If recoding into a different variable, enter the new variable name in the box
under Name:, then choose Change. This defines the output variable.
7.
Choose Old and New Values...
8.
Choose Value or Range under Old Value and enter old value(s).
9.
Choose New Value and enter new value, then choose Add.
10.
Repeat the process until all old values have been redefined.
11.
Choose Continue
12.
Choose OK
After
creating a new variable(s), you will probably want to save the new variable(s)
by re-saving your data using the Save command under File box on the menu bar
(See Saving Data as an SPSS Save File).
Example:
Recoding a Categorical Variable
One
can use the commands for recoding a variable to change the coding values of a
categorical variable. Suppose one may want to change a coding value for a
particular category to modify which category SPSS uses as the referent category
in a statistical procedure. For example, suppose you want to perform linear
regression using the ANOVA (or General Linear Model) commands, and one of your
independent variables is smoking status, smoke, that is coded 1 for never
smoked, 2 for former smoker and 3 for current smoker. By default, SPSS will use
current smoker as the referent category because current smoker has the largest
numerical (code) value. If you want never smoked to be the referent category
you need to recode the value for never smoked to a value larger than 3.
Although you can recode the smoking status into the same variable, it is
better to recode the variable into a new/different variable, new smoke,
so you do not lose your original data if you make an error while recoding.
Example: Creating a Categorical Variable
from a Numerical Variable
You can use the commands for recoding a variable to create a
categorical variable from a numerical variable (i.e., group values of the
numerical variable into categories). For example, suppose you have a
variable that is the number of pack years smoked, packyrs, and you
want to create a categorical variable with the four categories, 0, >0 to
10, >10 to 30, and >30 pack years smoked.
Automatic Recode
To
use of Automatic recode:
1.
Display the Data Editor window (i.e., execute the following commands while in
the Data Editor window displaying the data file you want to use to recode
variables).
2.
Choose Transform on the menu bar
3.
Choose Automatic Recode
4.
Choose Into Same Variable... or Into Different Variable...
5.
Choose Recoding Starting from – Lowest value or Highest value
6.
If you want to apply the same scheme for all the remaining variables choose the
same coding scheme for all variables, …… and remaining as per our requirement.
Rank Cases
Ranking cases in SPSS involves assigning a rank to each case based on the values of a specific variable. This can be useful for analyzing relative positions within your dataset.
- Go to Transform Menu
- Choose
the Rank Cases from the drop-down menu, select rank cases……
- Select
the variables to Rank – select variable in the list – highlight the variable we
need to give rank and move them to the variable to Rank box by clicking arrow button.
- Choose
Rank type and options. In the Rank cases dialogue box, we have selected rank
type
Rank Type – Ascending – if you need Smallest to largest
- Descending – if you need largest values to have largest rank
Ties - We
choose average, minimum or sequential ranking
Fractional – We can choose whether ranks
should be fractional i.e. when there are ties in ranks
- Create
a new variable for the rank – Rank Cases – Rank Column – type of new variable
name for each rank you create.
- Click
Ok
Sorting Data
Sorting cases in SPSS
is a way to arrange your data based on specific variable in either ascending or
descending order. Here are given steps
- Open
data menu
- Choose
Sort cases from the drop-down menu and select sort cases
- Select
the variable to sort from the list of variables in the data set
- Choose
the sort order ie. Ascending, descending
- Click
ok.
Grouping and Splitting data
Grouping Data
Using the "Select Cases" Command:
- Criteria:
Define conditions based on variable values.
- Example:
Select cases where "Age" is greater than 30 and
"Gender" is "Male."
- Steps:
1. Go
to Data > Select Cases.
2. Choose
"If condition is satisfied."
3. Enter
your condition in the syntax box.
4. Click
OK.
Splitting Data
1. Using the "Split File" Command:
- Grouping
Variable: Specify a variable to split the file based on its values.
- Example:
Split the file into two groups based on "Gender."
- Steps:
1. Go
to Data > Split File.
2. Choose
"Organize output by groups."
3. Select
the grouping variable.
4. Click
OK.
Descriptive Statistics
A common first step in
data analysis is to summarize information about variables in any dataset, such
as the averages and variances of variables. SPSS descriptive statistics are designed to give us
information about the distributions of our study variables. SPSS allows us to
complete a number of statistical procedures including measures of central
tendency, measures of variability around the mean, measures of deviation from
normality, and information concerning the spread of the distribution. Several summary or descriptive statistics are available under the Descriptives
option available from the Analyze and Descriptive Statistics
menus:
Kinds of descriptive statistics that SPSS provides
Measures of Central Tendency:
A measure of central tendency is a single value
that attempts to describe a set of data by identifying the central position
within that set of data. As such, measures of central tendency are sometimes
called measures of central location. They are also classed as summary
statistics. The mean (often called the average) is most likely the measure of
central tendency that you are most familiar with, but there are others, such as
the median and the mode.
The mean, median and mode are all valid measures of
central tendency, but under different conditions, some measures of central
tendency become more appropriate to use than others.
Mean (Average)
The mean (or average) is the most popular and
well-known measure of central tendency. It can be used with both discrete and
continuous data, although its use is most often with continuous data or
interval data.
Median
Median
is the value which occupies the middle position when all the observations are
arranged in an ascending/descending order. It divides the frequency
distribution exactly into two halves. Fifty percent of observations in a
distribution have scores at or below the median. Hence median is the 50th
percentile. Median is also known as ‘positional average’.
Mode
Mode is defined as the value that occurs most
frequently in the data. Some data sets do not have a mode because each value
occurs only once. On the other hand, some data sets can have more than one
mode. This happens when the data set has two or more values of equal frequency
which is greater than that of any other value. Mode is rarely used as a summary
statistic except to describe a bimodal distribution. In a bimodal distribution,
the taller peak is called the major mode and the shorter one is the minor mode.
Measures of Dispersion (Variability):
Measures
of dispersion express quantitatively the degree of variation or dispersion of
values in a population or in a sample.
Along with measures of central tendency, measures of dispersion are widely used in practice as descriptive statistics. Some measures of dispersion are the standard deviation, the mean deviation, the range,
the interquartile range.
From this discussion we now focus our attention on
the scatter or variability which is known as dispersion. Let us
take the following three sets of data.
|
Students |
Group A |
Group B |
Group C |
|
1 |
50 |
45 |
30 |
|
2 |
50 |
50 |
45 |
|
3 |
50 |
55 |
75 |
|
Mean |
50 |
50 |
50 |
Thus, the three groups have same mean i.e. 50. In
fact, the median of group A and B are also equal. Now if one would say that the
students from the three groups are of equal capabilities, it is totally a wrong
conclusion then. Close examination reveals that in group A students have equal
marks as the mean, students from group B are very close to the mean but in the
third group C, the marks are widely scattered. It is thus clear that the
measures of the central tendency is alone not sufficient to describe the data.
Measure for the size of the
distribution:
- Maximum: largest
value in the distribution
- Minimum: smallest
value in the distribution
- Range of values
in the distribution
- Sum of the scores
in the distribution
Measures of stability: Standard error
- Standard error is
designed to be a measure of stability or of sampling error.
- SPSS computes SE
for the mean, the kurtosis, and the skewness
- A small value
indicates a greater stability or smaller sampling error
Measures of the shape of the distribution
Skewness: The terms
“skewed” and “askew” are used to refer to something that is out of line or
distorted on one side. When referring to
the shape of frequency or probability distributions, “skewness” refers to
asymmetry of the distribution. The extent
to which a distribution of values deviates from symmetry around the mean. A
value of skewness is zero means the distribution is symmetric, while a positive
skewness indicates a greater number of smaller values, and a negative value
indicates a greater number of larger values. This statistic ranges from -1 to +1. A kurtosis value
of ±1 is considered very good for most social research uses, but ±2 is also
usually acceptable.
Kurtosis: a measure of the
"peakedness" or "flatness" of a distribution. A kurtosis
value near zero indicates a shape close to normal. A negative value indicates a
distribution which is more peaked than normal, and a positive kurtosis indicates
a shape flatter than normal. An extreme positive kurtosis indicates a
distribution where more of the values are located in the tails of the
distribution rather than around the mean. A kurtosis
value of +/-1 is considered very good for most social science research uses,
but +/-2 is also usually acceptable.
Using SPSS for Descriptive Statistics
For most computations, you should find
SPSS to be easier than Excel. In order
to find the two most predominant measures of central tendency (the mean and
median) we start in the Analyze menu. Within that menu, choose Frequencies and
Descriptive Statistics as shown:
Frequencies
While the descriptive
statistics procedure is useful for summarizing data with an underlying
continuous distribution, the Descriptives procedure will not prove
helpful for interpreting categorical data. Instead, Frequencies option
is more useful to investigate the number of cases that fall into various
categories. The Frequencies option allows you to obtain the number of
people within each education level in the dataset. The Frequencies
procedure is found under the Analyze menu:
Frequency Tables
for Categorical Variables. To produce frequency
tables and bar charts for categorical variables:
1. Choose Analyze from
the menu bar
2. Choose Descriptive
Statistics
3. Choose Frequencies…
4. Variable(s):
To select the variables you want from the source list on the left, highlight a
variable by pointing and clicking the mouse and then click on the arrow located
in the middle of the window. Repeat the process until you have selected all the
variables you want.
5. Choose Continue
6. Choose OK
Contingency
Tables for Categorical Variables. To produce
contingency tables for categorical variables:
1. Choose Analyze from
the menu bar.
2. Choose Descriptive
Statistics
3. Choose Crosstabs...
4. Row(s): Select
the row variable you want from the source list on the left and then click on
the arrow located next to the Row(s) box. Repeat the process until you have
selected all the row variables you want.
5. Column(s): Select
the column variable you want from the source list on the left and then click on
the arrow located next to the Column(s) box. Repeat the process until you have
selected all the column variables you want.
6. Choose Cells...
7. Choose the cell
values (e.g., observed counts; row, column, and margin (total)
percentages). Note the option is selected when the little box is not empty.
8. Choose Continue
9. Choose OK
Descriptive
Statistics for Numerical Variables. To
produce descriptive statistics for numerical variables:
1. Choose Analyze on
the menu bar
2. Choose Descriptive
Statistics
3. Choose Frequencies...
4. Variable(s): To
select the variables you want from the source list on the left, highlight a
variable by pointing and clicking the mouse and then click on the arrow located
in the middle of the window. Repeat the process until you have selected all the
variables you want.
5. Choose Display
frequency tables to turn off the option. Note that the option is turned off
when the little box is empty.
6. Choose Statistics
7. Choose summary
measures (e.g., mean, median, standard deviation, minimum, maximum,
skewness or kurtosis).
8. Choose Continue
9. Choose OK
Graphing
Your Data
One can produce very
fancy figures and graphs in SPSS.
Bar Charts
The easiest way to
produce simple bar charts is to use the Bar Chart option with the
Frequencies... command. In Frequency Tables (& Bar Charts) for Categorical
Variables. One can only produce only one bar chart at a time using the Bar
command.
1. Choose Graphs (&
then Legacy Dialogs, if Version 15) from the menu bar.
2. Choose Bar...
3. Choose Simple,
Clustered, or Stacked
4. Choose what the data
in the bar chart represent (e.g., summaries for groups of cases).
5. Choose Define
6. Select a variable
from the variable list on the left and the click on the arrow next to the
Category axis.
7. Choose what the bars
represent (e.g., number of cases or percentage of cases)
8. Choose OK
Histograms
The easiest way to
produce simple histograms is to use the Histogram option with the
Frequencies... command. One can produce only one histogram at a time using the
Histogram command.
1. Choose Graphs (&
then Legacy Dialogs, if Version 15) from the menu bar
2. Choose Histogram...
3. Select a variable
from the variable list on the left and then click on the arrow in the
middle of the window.
4. Choose Display
normal Curve if you want a normal curve superimposed on the histogram.
5. Choose OK
Boxplots
The easiest way to
produce simple boxplots is to use the Boxplot option with the Explore...
command. One can produce only one boxplot at a time using the Boxplot command.
1. Choose Graphs (&
then Legacy Dialogs, if Version 15) from the menu bar.
2. Choose Boxplot...
3. Choose Simple or
Clustered
4. Choose what the data
in the boxplots represent (e.g., summaries for groups of cases).
5. Choose Define
6. Select a variable
from the variable list on the left and then click on the arrow next to the
Variable box.
7. Select the
variable from the variable list that defines the groups and then
click on the arrow next to Category Axis.
8. Choose OK
Normal
Probability Plots. To produce Normal
probability plots:
1. Choose Graphs from
the menu bar.
2. Choose Q-Q... to
get a plot of the quantiles (Q-Q plot) or choose P-P... to get a plot of
the cumulative proportions (P-P plot)
3. Select the
variables from the source list on the left and then click on the arrow
located in the middle of the window.
4. Choose Normal as
the Test Distribution. The Normal distribution is the default Test
Distribution. Other Test Distributions can be selected by clicking on the down
arrow and clicking on the desired Test distribution.
5. Choose OK
SPSS will produce both
a Normal probability plot and a detrended Normal probability plot for each
selected variable. Usually the Q-Q plot is the most useful for assessing if the
distribution of the variable is approximately Normal.
Scatter
Plot. To produce a scatter plot between two
numerical variables:
1. Choose Graphs (&
then Legacy Dialogs, if Version 15) on the menu bar.
2. Choose Scatter/Dot...
3. Choose Simple
4. Choose Define
5. Y Axis:
Select the y variable you want from the source list on the left and then click
on the arrow next to the y axis box.
6. X Axis:
Select the x variable you want from the source list on the left and then click
on the arrow next to the x axis box.
7. Choose Titles...
8. Enter a title for
the plot (e.g., y vs. x).
9. Choose Continue
Chi-square test
A chi-square test is a statistical test commonly used for
testing independence of attributes and goodness of fit test. Pearson's chi-squared test is used to test two
types of comparison: tests of goodness
of fit and tests of independence.
·A test of independence assesses whether paired observations
on two variables, expressed in a contingency
table, are independent of each other. The test is applied when
you have two categorical variables from a single population. It is used to determine whether there is a
significant association between the two variables. For example, in a nutrition
survey, players might be classified by Sex
(male or female) and Sports_Type
(Kho-Kho, Running, Kabadi, Volleyball or Hockey). We could use a chi-square
test for independence to determine whether gender is related to sports type.
·
A test of goodness of fit establishes whether observed frequencies are significantly different
from expected frequencies. When an analyst
attempts to fit a statistical model to observed data, we have to identify how
well the model actually reflects the data. How "close" are the
observed values to those which would be expected under the fitted model?
Measures of Association
•
The Crosstabs
procedure forms two-way and multiway tables and provides a variety of tests and
measures of association for two-way tables.
•
If you specify
a row, a column, and a layer factor (control variable), the Crosstabs procedure
forms one panel of associated statistics and measures for each value of the
layer factor.
The
Chi Square Test of Independence
Click Analyze
Descriptive Statistics
Crosstabs...
A new window pops out. Select one or more row
variables and one or more column variables. The Chi Square Statistic (Pearson
chi-square) is used to test the hypothesis that the row and column variables
are independent.
Select the two variables Sex and Sports_Type will be shown on the list on the left. It does not matter which variables we select as rows and which as column. For illustration purpose, we select Sex as Row(s) and Sports_Type as Column(s). Now click Statistics… on the right. A new window Crosstabs: Statistics pops out.
Make sure that the Chi-square box at the top is
checked. Click Continue. The window will be closed. Click Continue. The window
will then be closed. Now click OK in the original window.
In the third table, since the Pearson Chi-Square value is 28.514, the p-value is .000, we can reject the null hypothesis and conclude that there is a relationship between Sex and Sports_Type at 1% level of significant.
The t test is one type of
inferential statistics. It is used to determine whether there is a significant
difference between the means of two groups. With all inferential statistics, we
assume the dependent variable fits a normal distribution. When the difference between two population averages is being
investigated, a t test is used. In other words, a t test is
used when we wish to compare two means (the scores must be measured on an
interval or ratio scale).
One sample t-test
A one sample t-test allows us to test whether a
sample mean (of a normally distributed interval variable) significantly differs
from a hypothesized value. For example, using the Home Science Data file, say we wish to test whether the average protein differs
significantly from 80.
The One-Sample
T Test procedure tests whether the mean of a single variable differs from a
specified constant.
To test the
values of a quantitative variable against a hypothesized test value, choose a
quantitative variable and enter a hypothesized test value.
Assumptions.
This test assumes that the data are normally distributed; however, this test is
fairly robust to departures from normality.
To Obtain a
One-Sample T Test: From the menus choose
Select the variable(s) that you want to test by clicking on it in the left-hand pane of the One-Sample t Test dialog box. Then click on the arrow button to move the variable into the Test Variable(s) pane. In this example, move the Protein variable into the Test Variables box: Click in the Test Value box and enter the value that you will compare to.
Enter a
numeric value 80 in the Test Value box against which each sample mean is
compared and click OK to see the output.
Output of
one-sample t-test:
This output tells us that we have 82 observations (N), the mean number of Protein is 78.41 and the standard deviation of the number of Protein is 14.623.
The mean of
the variable protein for this particular sample of players is 78.41,
which is statistically not different from the test value of 80 (p-value is
greater than 0.05, we accept our null hypothesis, i.e there is no significant
difference between sample mean and hypothesized mean).
Independent-Samples
T Test (Two independent samples t-test)
The
Independent-Samples T Test procedure compares means for two groups of cases.
Ideally, for this test, the subjects should be randomly assigned to two groups,
so that any difference in response is due to the treatment (or lack of
treatment) and not to other factors.
Assumptions
For the
equal-variance t test, the observations should be independent, random samples
from normal distributions with the same population variance.
For the
unequal-variance t test, the observations should be independent, random samples
from normal distributions.
From the menus
choose
Analyze
Compare
Means
Independent-Samples
T Test
It produces the following dialog box:
To select
variables for the analysis, first highlight them by clicking on them in the box
on the left. Then move them into the appropriate box on the right by clicking
on the arrow button in the center of the box. Your independent variable should
go in the Grouping Variable box, which is a variable that defines which groups
are being compared. For example, because sports types are being compared in
this analysis, the Sports_Type variable is selected. However, because
Sports_Type has more than two levels, you will need to click on Define Groups
to specify the two levels of Sports_Type that you want to compare. This will
produce another dialog box as is shown below:
Here, the
groups to be compared are limited to the groups with the values 1 and 2, which
represent the Kho-Kho and rnning groups. After selecting the groups to be
compared, click the Continue button, and then click the OK button in the main
dialog box.
The above
choices will produce the following output:
The columns labeled "Levene's Test for Equality of Variances" tell us whether an assumption of the t-test has been met. Look at the column labeled "Sig." under the heading "Levene's Test for Equality of Variances". In this example, the significance (p value) of Levene's test is .776. If this value is less than or equal to your α level for this test, then you can reject the null hypothesis that the variances of the two groups are equal, implying that the variances are unequal. In this example, .776 is larger than our α level of .05, so we will assume that the variances are equal and we will use the middle row of the output.
The column
labeled "t" gives the observed or calculated t value. In this
example, assuming equal variances, the t value is 2.066. (We can ignore the
sign of t when using a two-tailed t-test.) The column labeled "df"
gives the degrees of freedom associated with the t test. In this example, there
are 43 degrees of freedom.
The column
labeled "Sig. (2-tailed)" gives the two-tailed p value associated
with the test. In this example, the p value is .045. This value is less than or equal to 0.05
level for this test, then we can reject the null hypothesis that the means of
the two groups are not equal (i.e. p<0.05).
Paired-Samples T Test
The
Paired-Samples T Test compares the means between two related groups on the same
continuous, dependent variable. For example, you could use a dependent t-test
to understand whether there was a difference in smokers' daily cigarette
consumption before and after a 8 week hypnotherapy programme (i.e., your
dependent variable would be "daily cigarette consumption", and your
two related groups would be the cigarette consumption values "before"
and "after" the hypnotherapy programme). A paired sample t-test is
used to determine whether there is a significant difference between the average
values of the before and after measurement for a single group.
To obtain a paired-samples t test:
Analyze
Compare Means
Paired-Samples T Test
A new window pops out. Drag the variable "Before" and "After" from the list on the left to the pair 1 variable 1 and variable 2 respectively, as shown below.
Clicking the OK
button with the above variables selected will produce output for the
paired-samples t test. The following output is an example of the
statistics you would obtain from the above example.
|
Paired Samples Statistics |
|||||
|
|
Mean |
N |
Std. Deviation |
Std. Error Mean |
|
|
Pair 1 |
Before |
85.3333 |
15 |
7.04746 |
1.81965 |
|
After |
74.0000 |
15 |
6.40312 |
1.65328 |
|
|
Paired Samples Correlations |
||||
|
|
N |
Correlation |
Sig. |
|
|
Pair 1 |
Before
& After |
15 |
.594 |
.020 |
First table gives the descriptive statistics for each of the two groups (as defined by the pair of variables.).
Second table shows that there are 15 pairs of observations (N). The correlation between the two variables is given in the third column. In this example r = 0.594. The last column gives the p value for the correlation coefficient. As always, if the p value is less than or equal to the alpha level (0.05), then we can reject the null hypothesis that the population correlation coefficient (ρ) is equal to 0. In this case, p = .000, so we reject the null hypothesis and conclude that the fitness program is more effective at 1% significant level (i.e. p<0.01).
UNIT-5
Correlation
Correlation is one of the most common forms of data analysis both
because it can provide an analysis that stands on its own, and also because it
underlies many other analyses, and can can be a good way to support conclusions
after primary analyses have been completed. Correlations are a measure
of the linear relationship between two variables. A correlation coefficient has
a value ranging from -1 to 1. Values that are closer to the absolute value of 1
indicate that there is a strong relationship between the variables being
correlated whereas values closer to 0 indicate that there is little or no
linear relationship. The sign of a correlation coefficient describes the type
of relationship between the variables being correlated. A positive correlation
coefficient indicates that there is a positive linear relationship between the
variables: as one variable increases in value, so does the other. An example of
two variables that are likely to be positively correlated are the number of
days a student attended class and test grades because, as the number of classes
attended increases in value, so do test grades. A negative value indicates a
negative linear relationship between variables: as one variable increases in
value, the other variable decreases in value. The number of days students miss
class and their test scores are likely to be negatively correlated because as
the number of days of missed classed increases, test scores typically decrease.
To obtain a correlation in SPSS, start at the Analyze menu.
Select the Correlate option from this menu. By selecting this menu item,
you will see that there are three options for correlating variables: (1) Bivariate,
(2) Partial, and (3) Distances. This document will cover the
first two types of correlations. The bivariate correlation is for
situations where you are interested only in the relationship between two
variables. Partial correlations should be used when you are measuring
the association between two variables but want to factor out the effect of one
or more other variables.
To obtain a bivariate correlation, choose the following menu
option:
Analyze
Correlate
Bivariate...
This will produce the following dialog box:
To obtain correlations, first click on the variable names in the
variable list on the left side of the dialog box. Next, click on the arrow
between the two white boxes which will move the selected variables into the Variables
box. Each variable listed in the Variables box will be correlated with
every other variable in the box. For example, with the above selections, we
would obtain correlations between Education Level and Current Salary,
between Education Level and Previous Experience, and between Current
Salary and Previous Experience. We will maintain the default options
shown in the above dialog box in this example. The first option to consider is
the type of correlation coefficient. Pearson's is appropriate for continuous
data as noted in the above example, whereas the other two correlation
coefficients, Kendall's tau-b and Spearman's, are designed for ranked data. The
choice between a one and two-tailed significance test in the Test of
Significance box should be determined by whether the hypothesis you are
testing is making a prediction about the direction of effect between the two
variables: if you are making a prediction that there is a negative or positive
relationship between the variables, then the one-tailed test is appropriate; if
you are not making a directional prediction, you should use the two-tailed test
if there is not a specific prediction about the direction of the relationship
between the variables you are correlating. The selections in the above dialog
box will produce the following output:
is output gives us a correlation matrix for the three correlations requested in the above dialog box. Note that despite there being nine cells in the above matrix, there are only three correlation coefficients of interest: (1) the correlation between current salary and educational level, the correlation between previous experience and educational level, and the correlation between current salary and previous experience. The reason only three of the nine correlations are of interest is because the diagonal consists of correlations of each variable with itself, always resulting in a value of 1.00 and the values on each side of the diagonal replicate the values on the opposite side of the diagonal. For example, the three unique correlation coefficients show there is a positive correlation between employees' number of years of education and their current salary. This positive correlation coefficient (.661) indicates that there is a statistically significant (p < .001) linear relationship between these two variables such that the more education a person has, the larger that person's salary is. Also observe that there is a statistically significant (p < .001) negative correlation coefficient (-.252) for the association between education level and previous experience, indicating that the linear relationship between these two variables is one in which the values of one variable decrease as the other increases. The third correlation coefficient (-.097) also indicates a negative association between employee's current salaries and their previous work experience, although this correlation is fairly weak.
Regression is a technique that can be used to investigate the
effect of one or more predictor variables on an outcome variable. Regression
allows you to make statements about how well one or more independent variables
will predict the value of a dependent variable. For example, if you were
interested in investigating which variables in the employee database were good
predictors of employees' current salaries, you could create a regression
equation that would use several of the variables in the dataset to predict
employees' salaries. By doing this you will be able to make statements about
whether knowing something about variables such as employees' number of years of
education, their starting salary, or their number of months on the job are good
predictors of their current salaries.
To conduct a regression analysis, select the following from the Analyze
menu:
Analyze
Regression
Linear...
This will produce the following dialog box:
his dialog box illustrates an example regression equation. As with other analyses, you select variables from the box on the left by clicking on them, then moving them to the boxes on the right by clicking the arrow next to the box where you want to enter a particular variable. Here, employees' current salary has been entered as the dependent variable. In the Independent(s) box, several predictor variables have been entered, including education level, beginning salary, months since hire, and previous experience.
Comments
Post a Comment