- Shuffle
Toggle OnToggle Off
- Alphabetize
Toggle OnToggle Off
- Front First
Toggle OnToggle Off
- Both Sides
Toggle OnToggle Off
Front
How to study your flashcards.
Right/Left arrow keys: Navigate between flashcards.right arrow keyleft arrow key
Up/Down arrow keys: Flip the card between the front and back.down keyup key
H key: Show hint (3rd side).h key
![]()
PLAY BUTTON
![]()
PLAY BUTTON
![]()
67 Cards in this Set
- Front
- Back
|
Exploratory Factor Analysis
|
a multivariate methods that allows you to explore the underlying structure of variables
• provides the tools for analyzing the structure of the correlations among a large number of variables by defining sets of variables (factors) that are interrelated |
|
Exploratory
|
no a priori predictions about how variables should look;
|
|
Confirmatory
|
see if variables confirm your predictions about variable structure
|
|
Stage 1
|
Objectives of Factor Analysis - How objectives fit with the RQ
Specify Unit of Analysis Factor Analysis Outcomes Variable Selection |
|
Unit of Analysis
|
R Factor vs. Q Factor
R Factor identifies latent variables (not easily observed) Q is used to reduce people into groups |
|
Factor Analysis Outcomes:
Data Summarization |
Dimensions that describe data in a small number of concepts
|
|
Data Reduction
|
Extends summarization by providing factor score for each dimension (factor)
|
|
Variable Selection
|
Use appropriate judgement
Dont get garbage in, garbage out |
|
Stage 2: Designing a Factor Analysis
Rules of Thumb |
Calculate input data - R vs. Q
Variable selection - mostly metric ~ 5 metrics for proposed dimensions Sample Size - 50 minimum; 100 to 200 preferred 5x as many subjects as proposed variables 10:1; 20:1 is better |
|
Stage 3: Assumptions
Conceptual |
Some structure does exist
Patterns are appropriate Homogeneous sample |
|
EFA Assumptions
Statistical |
Normality
Multicolinearity is desired - Should be > .30 - Partial correlations > .7 are awesome |
|
KMO Statistic
|
Predicts if data is likely to factor well based on correlation and partial correlation
- Identify which factors to drop |
|
KMO Rules of Thumb
|
> .5 is required to proceed
> .7 or .8 is very good If <.5, remove variable with lowest KMO score one at a time until KMO scores are > .5 |
|
Bartlett Sphericity
|
Examines entire correlation matrix
Stat. sig. at p < .05, meaning correlations exist among the variables |
|
Stage 4: Deriving Factors and Assessing Fit
|
Select Factor Method
Determining number of Factors |
|
Selecting Factors RULE OF THUMB
|
30 or more variables
> .6 communality number - Use component analysis when data reduction is necessary - Common factor occurs for more theoretical basis |
|
Determining number of factors RULE OF THUMB
|
eigenvalues > 1.0
scree test for common variance enough to meet specified common variance (usually >.6) |
|
Interpreting the Factors Steps
|
- Examine the factor loadings
- Identify highest loading - Delete cross-loadings - Assess communalities (remove <.5) - Label the factors |
|
Process of Interpretation
|
Estimate the factor matrix:
- Examine factor loadings - correlation of each variable and the factor - • Higher loadings = representative of factor |
|
Factor Rotation
|
Reference axes are turned about the origin until other position is reached. Graphical way to see what factors correlate with what factors
-Oblique Rotation - not maintained at 90 degrees Orthogonal Rotation - 90 degrees QUARTIMAX – simplifies rows • Maximizing a variables loading on a single factor VARIMAX – simplifies columns (better results) • Making # of high loadings as few as possible • EQUIMAX - combination Orthogonal Rotation - 90 degrees |
|
Factor Loading Criteria
|
+- > .5 are considered practically significant (consult literature for specific discipline)
Loadings > 1.7 indicative of well defined structure Sample Size should be > 100 for practical significance |
|
Validation of Factor Analysis
|
Confirmatory perspective
- Split the Sample, or analyze with separate sample Assess factor structure stability - look at sample size and the number of cases per variable Detect influential observations |
|
Additional Uses of Factor Analysis Results
|
Select variable with highest loading factor as a surrogate representative for a particular factor dimension
Replace original variables with small set of variables created from summated scales - >.7 cronbachas alpha convergent validity - like other scales discriminate - differ from other like scales nomological valid - like the theory that shaped it |
|
Why Examine Data?
|
Help with a basic understanding of data and the relationships between variables
To ensure the data has met all the requirements for the analysis (assumptions, outliers) |
|
First Step to Managing Data
|
Assess whether data was entered correctly.
Could check data against original data. |
|
Graphical Examination of Data
|
Histogram - determines shape of distribution
Scatterplot - linear or curvlinear relationship between 2 variables Boxplot - group differences |
|
Missing Data
|
identify the patterns associated with missing data to understand how missing data is missing
|
|
Impact of missing data
|
Can reduce sample size
Can distort results and introduce bias |
|
If no Pattern is found,
|
Dummy code variable - one group with missing and one group with none
T-Test other variables as the DV against the missing ones If no difference, feel safe deleting values If difference, there are steps to take |
|
If pattern, or too many missing data
|
Replace values with numbers from prior knowledge or educated guess
replace values with variable mean Replace with group mean. little reduction in validity Use regression to predict missing values |
|
Identifying missing data
|
Determine type of missing data
- Ignorable (delete) - Nonignorable (dont delete) |
|
Determine Extent of Missing Data
|
10% ignored, except in nonrandom fashion
ii. Always see if cases with no missing data must be sufficient to run the analysis iii. >= 15% are candidates for deletion iv. >50% delete data, unless variable is essential to model |
|
Diagnose the Randomness of the Missing Data Processes
|
Missing at Random (MAR - not random)
i. Missing values of Y depend on X, but not on Y. i.e., one gender is significantly different than another b. Missing Completely at Random (MCAR) i. Cases with missing data are indistinguishable from cases with complete data. |
|
Select the Imputation Method (estimating the missing values based on the available values)
|
MAR data process – apply specific modeling approach (EM approach)
b. MCAR – i. use only valid data – Listwise method, Parwise (all available data) ii. replacement values – case substitution, hot or cold deck, mean substitution, |
|
Imputation Method Rules of Thumb
|
i. < 10%, any imputation method can be applied
ii. 10% - 20% - all available, hot deck, regression iii. > 20% regression method for MCAR, model method for MAR |
|
Steps to Identify Missing Data
|
1. Determine the Type of Missing Data (ig. or not)
2. Determine the Extent of the Missing Data 3. Diagnose the Randomness of the Missing Data Processes 4. Select the Imputation Method (estimating the missing values based on the available values) |
|
Outliers
|
• distinct difference from other observations/responses
• Is the observation/response representative of the population? • Check for both univariate and multivariate outliers |
|
Reasons for Outliers
|
• Data entry mistake
• Missing value code not specified • Outlier not a part of population • Part of population, but is extreme: o Delete, change to fit normality but still keep extreme, transform (if normality is met) |
|
Identifying Outliers -
Standard Score Rules |
• 80 subjects or fewer, outliers are defined at standard scores > 2.5
• Larger samples, 4 standard scores • 2.5 to 4 SDs, if standard scores are not used |
|
• 90-10 split
|
If you have a dichotomous variable with an extremely uneven split (i.e. 90 – 10 split, 90% say yes and 10% say no) this will produce an outlier. The only fix for this is to delete the variable.
|
|
• Univariate outliers
|
very large standardized scores (z scores greater than 3.3) and that are disconnected from the distribution
|
|
• Bivariate outliers
|
specific variable relationships – scatterplots with confidence intervals
|
|
• Multivariate Outliers
|
are found by first computing a Mahalanobis Distance for each case and once that is done the Mahalanobis scores are screened in the same manner that univariate outliers are screened
|
|
Assumptions
|
Normality
- Skewness and Kurtosis Homoscadasity Homogeneity of Variance Homogeneity of Variance-Covariance Matrices |
|
Normality
|
shape of the data distribution and its correspondence to a normal distribution
• Skewness – the balance or the shift of the distribution. Can be positive (left shift) or negative (right shift). o Must be between -1 and 1 • Kurtosis – peaked or flat distribution is. o Must be less than 8 |
|
Skewness
|
the balance or the shift of the distribution. Can be positive (left shift) or negative (right shift).
o Must be between -1 and 1 |
|
Kurtosis
|
peaked or flat distribution is.
o Must be less than 8 |
|
Homoscedasticity
|
• Equal variances across independent variables
• if both variables are normally distributed than you should have homoscedasticity |
|
Homogeneity of Variance
|
variance in DV is expected to be the same for all levels of the IV.
• Important for grouped data • SPSS gives the Levene’s test as a measure of homogeneity of variance. • Above .05, heterogeneous |
|
Homogeneity of Variance-Covariance Matrices
|
used for multivariate tests
• an entry in a variance-covariance matrix using one DV should be similar to the same entry in a matrix using another DV. • formal test for this in SPSS is Box’s M |
|
Data Transformations
|
• Done to:
o Correct violations to assumptions o Improve correlations between variables recommended as a last resort only because of the added difficulty of interpreting a transformed variable • Common Transformations: • 1) square root, used when there is moderate skewness/ deviation, • 2) logarithm, used when there substantial skewness/ deviation and • 3) inverse, used when there is extreme skewness/ deviation |
|
Transformation RULES OF THUMB
|
• impact of transformation – calculate ratio of variable’s mean to its SD
o noticeable effects occur when ratio is < 4 • applied to IV, except when doing them for heteroscadacity • use variables in untransformed format when interpreting results |
|
Multicollinearity
|
• If you have a correlation between two variables that is .90 or greater
|
|
Singularity
|
two variables are identical or one is a subscale of another they are singular
|
|
Dummy Variables
|
• nonmetric IV that has two (or more) distinct levels that are coded 0 and 1
• act as replacement variables so that nonmetric variables can be used as metric |
|
What is Multivariate Analysis?
|
• Analysis of single variables in a single relationship of set of relationships
• Used for measurement, predicting and explaining, and testing hypotheses |
|
Variate
|
• Linear combo of variables with empirically determined weights
• Every subject has a variate value (Y’), which is the dependent variable, or the linear combination of the entire set of variables |
|
• Nonmetric Scales
|
(Qualitative)
o Nominal – labels/categories (i.e., occupation, gender, class rank) o Ordinal – ordered variable with specific order (i.e., first place, second place). Relative positions in ordered series. Distances are not equal and cannot be determined. (i.e., 7 point scale) |
|
Metric Scales
|
Quantitative)
o Interval – No natural zero point; equal differences between scale points (temperature) o Ratio – natural zero point (i.e., money or weight) |
|
Measurement Error
|
• Degree to which the observed values do not represent the true values.
• Caused by: o Data entry, imprecise measurement scales |
|
Validity
|
o Degree to which measure accurately represents what it is supposed to represent
• Measuring total income by asking for disposable income |
|
• Reliability
|
o Degree to which measure accurately represents true value AND is error free
• If repeated measures of a variable are consistent, they are reliable |
|
Type 1 Error
|
o Probability of rejecting null hypothesis when it is true
alpha, usually .05 |
|
• Type 2 Error
|
o Probability of failing to reject a null hypothesis when it is false. Chance of not finding correlation when there is a correlation.
Beta. |
|
• Power
|
o Probability of rejecting the null hypothesis when it is false.
Correctly finding a hypothesized relationship when one exists. 1 – Beta o About .8 or higher |
|
Power Determined by:
|
• Sample Size – Increase sample size, increase power
• Alpha Value – increase alpha, power increases. • Effect Size - The magnitude of the effect of interest. Whether the correlation between variables, or the observed relationship is meaningful. • Small effect sizes require larger sample sizes |
|
Guidelines for Multivariate Analysis
|
• Establish both practical significance (“So What?”) and statistical significance
• Recognize that sample size affects all results o Power, effect size, no generalizability (small sample size), find anything (large sample size - > 400) • Know Your Data. o Outliers, Assumptions, Missing Data • Strive for Model Parsimony o Don’t put in irrelevant variables, which could lead to multicollineraity (degree to which any variables effect can be attributed to other variables) • Look at Your Errors o Assess the validity of measurement; unexplained relationships • Validate Results o Results could be specific to the sample o Split sample into two subsamples and re-run analysis o Get separate sample o Employ bootstrapping, which is large number of subsamples from the samples |