Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
145 Cards in this Set
- Front
- Back
Population
|
- Target group for inference
- Parameters are Numerical Characteristics - Large, unobtainable, often hypothetical |
|
Sample
|
- Sub group of population
- Statistics are Numerical Characteristics - Obtainable, Small |
|
μ (Mu)
|
Population Mean
|
|
x bar
|
Sample Mean
|
|
Regression Equation:
x |
score on predictor variable
|
|
Simple Random Sampling
|
Independent Selection
REDUCES bias in generalizations |
|
Research
|
Scientific Structured Solving Problem
|
|
Regression Equation:
a |
y-intercept, value of y' when x=0
|
|
7 Topics that all Inferential Statistics have in Common
|
1) Use of Descriptive Statistics
2) Use of probability 3) Potential for estimation 4) Sampling Variability 5) Sampling Distributions 6) Use of a Theoretical Distribution 7) 2 Hypotheses, 2 Decisions, 2 Types of Errors |
|
Steps of Scientific Methods
|
1) Encounter and identify problem
2) Formulate Hypotheses and Define Variables 3) Think through consequences of hypotheses 4) Design study, run it, collect data, compute statistic, test hypothesis 5) Draw Conclusions |
|
Independent Variable (IV)
|
- Manipulated by researcher
- Researcher CHANGES values of the variable - Comes first in time |
|
Characteristics of Regression
|
1) linear only
2) generalize only for x values in your sample 3) y is different from y'; y=y'+e 4) error is e=y-y' |
|
Dependent Variable (DV)
|
- Measured by researcher
- Follows IV |
|
Extraneous Variable
|
- Should be controlled by researcher
- Competitors to IV - Influence DV |
|
Best Fitting Line
|
Stats b&a are computed so as to minimize the sum of e squared (Least Squares Principle)
|
|
Random Assignment
|
- Purpose: Control EV
- When: After random sampling; form groups out of entire sample |
|
Variable
|
Entity that is free to take on different values
|
|
Partition Total Spread
|
- Total = Explained+NotExplained
- for both proportion of spread and amount of spread |
|
Ways to control Extraneous Variables
|
1) Randomization of subjects to groups
2) Keep constant for all subjects 3) Include in design |
|
Predictor Variable
|
Comes first in time, but not manipulated
|
|
Probability
|
Relative Frequency
|
|
Criterion Variable
|
Follows Predictor Variable
|
|
Sample Space
|
all possible outcomes of a research project
|
|
Operational Definition
|
Type of variable is assigned depending on how it is used in the study
|
|
Types of Relationships
|
Causal, Predictive
|
|
Causal Relationships
|
IV causes DV
Keys: a) manipulation of IV b) randomization of subjects to groups c) replication - N>1 for each group |
|
Elementary Event
|
any one data point in sample space
|
|
Predictive Relationships
|
PV predicts CV
Keys: a) no manipulation b) no randomization of subjects to groups c) have replication |
|
Event
|
any collection of Elementary Events
|
|
Types of Research
|
True Experiment
Observational Research |
|
True Experiment
|
a) Manipulation of Variable
b) Randomization of subjects to groups c) Replication |
|
Observational Research
|
a) No manipulation of Variable
b) No randomization of subjects c) Replication |
|
P(Elementary Event)
|
1 / (total # in sample space)
|
|
Quantitative Data
|
Data has numeric value
|
|
P(Event)
|
(# in event)/(total #)
|
|
Qualitative Data
|
Data has numeric label
|
|
Aspects of Data
|
Middle - central tendency, location, center
Spread - variability, dispersion Skewness - departure from symmetry Kurtosis - peakedness relative to normal curve |
|
Measures of Middle
|
mean, median, mode, T20
|
|
Conditional Probability
|
P(A|B)=(# in A and B together) / (# in B)
|
|
Measures of Spread
|
range, midrange, s*^2, s*, s^2, s
|
|
Standard/Unit Normal Distribution
|
Mu = 0
Sigma^2 = 1 |
|
Characteristics of a good measure of spread
|
1) Stat = 0 if spread is zero
2) As spread increases, stat increases 3) Stat measures just spread, not middle |
|
Midrange (MR)
|
Upper Hinge - Lower Hinge
UH - LH |
|
Median Position (MP)
|
(N + 1) / 2
|
|
Sampling Distributions Purpose
|
To get probabilities of stat to make inferences to get information necessary to estimate parameters
|
|
Hinge Position (HP)
|
([MP] + 1) / 2
|
|
Whiskers
|
Lines drawn from a hinge to an adjacent value
|
|
Sampling Distributions Definition
|
A distribution of a statistic could be formed by drawing all possible samples of a given size N from some population, computing the stat for each sample, and arranging these stats in a distribution
|
|
s*^2
Sample Variance |
{Σ(x-xbar)^2} / N
|
|
3 things to know about sampling distributions of x bar
|
1) Mu of x bar = Mu
2) Sigma squared of x bar = sigma squared/N 3) Shape normal IF a)Population is normal ORb)N is large |
|
s*
Sample Standard Deviation |
√ s*^2
|
|
s^2
Unbiased Variance Estimate |
{Σ(x-xbar)^2} / (N – 1)
|
|
Central Limit Theorem
|
Shape is normal if N is large
|
|
s
|
√s(squared)
|
|
Outliers
|
Any real data values outside whiskers
|
|
Unbiased
|
Mu of stat = desired parameter
|
|
z-score
|
aspect of data = relative position/standing
"something minus it mean divided by its standard deviation" |
|
characteristics of z-scores
|
1) mean of a set = 0
2) variance of a set = 1 3) shape is the same as the shape of the distribution of the somethings |
|
Characteristics of Normal Distributions
|
1) symmetric, continuous, theoretical, unimodal
2) bell-shaped 3) scores range from -infinity to +infinity 4) has 2 parameters (mu and sigma squared) |
|
Hypothesis testing
|
the process of testing tentative guesses about relationships between variables and populations
|
|
2 Keys for probability in N(0,1)
|
1) distribution symmetric
2) total area/probability = 1 |
|
test statistic
|
a statistic used only for the purpose of testing hypotheses
|
|
Correlation and Regression have in common . . .
|
1) x,y pairs of scores
2) linear relationships |
|
Correlation
|
stat = r
Purpose is to measure the degree of linear relationship |
|
Regression
|
Purpose: to measure form of function of linear relationship
Prediction Equation: y'=bx+a |
|
Assumptions
|
Conditions placed on a test statistic necessary for its valid use in hyp. testing
|
|
Characteristics of r
|
1) works with 2 variables, x&y
2) -1.005<=r<=1.00 3) measures only linear relationships 4) r(squared) = proportion of variability in y that is explained by x 5) r undefined if x or y has zero spread 6) r is demensionless |
|
Assumptions for z of x bar
|
1) pop. of obs. normal
2) obs. are independent |
|
Population Correlation Coefficient's Impact on r
|
- restriction of range
- combining data - outliers |
|
Correlation does NOT imply . . .
|
causation
|
|
Regression Equation:
y' |
predicted score on y (criterion variable)
|
|
Null Hypothesis
|
H sub naught: The hypothesis we test (decision to reject or retain)
|
|
Regression Equation:
b |
slope
|
|
Alternative Hypothesis
|
H sub one: where we put what we believe
|
|
Significance level
|
the standard for what we mean by a small probability in hypothesis testing, alpha = .05
|
|
Directional hypothesis
|
any Hypothesis with <,>,<=,>=
|
|
Nondirectional hypotheses
|
do not specify direction, eg not equal
|
|
One-tailed Test
|
uses only one tail of the sampling distribution of the test
|
|
Critical values
|
values of the test statistic that cut off alpha in the tail(s) of the sampling distribution
|
|
Rejection Values
|
valus of hte test stat for which we would reject Hnaught.
|
|
Critical Value Decision Rules
|
Reject null Hypothesis if the test stat is more extreme than a critical value
|
|
p-value Decision Rule
|
Reject Null if both are true:
1).5SAS p-value <= alpha 2) the result (test statistic) agrees with the alternative |
|
Type I Error
|
Reject Null Hypothesis given Null is true
|
|
Type II Error
|
Retain Null Hypothesis given Alternative is true
|
|
p(Type I Error)=
|
p(rej. Null|Null true) = alpha
|
|
p(Type II Error)=
|
p(ret. Null|Alternative true) = Beta
|
|
Effect size relationship to power
|
as Effect size increase, power increases
|
|
N (sample size) relationship to power
|
as N increases, power increases
|
|
Sigma squared relationship to power
|
as sigma squared decreases, power increases
|
|
alpha's relationship to power
|
as alpha increases, power increases
|
|
directional hypothesis influence on power
|
gives best power if correct in predicting direction, but zero power if you are wrong
|
|
non-directional hypothesis influence on power
|
gives good power in both directions
|
|
properties of t-distribution
|
-A family of distributions
-Have one parameter = df -Mu of t = 0 -sigma^2 of t =df/(df-2)]>1 -Symmetric,sort of bell-shaped |
|
t=
|
(x bar-mu)/ sqroot(s^2/N)
|
|
degrees of freedom
|
-parameter of t-distribution
-(# of independent components - #of statistics) -for 1-sample t, =N-1 -associated with s^2 |
|
one-sample t assumptions
|
-population bivariate normal
-subjects are independent |
|
Correlation
|
-one sample
-x,y pairs Null: rho=0 |
|
Correlation statistic
|
r
|
|
Correlation degrees of freedom
|
N-2
|
|
2 ind sample t-test
|
-2 independent samples
-Null: mu1=mu2 -Sigma^2 unknown -Use when n1=n2>=15 |
|
2 ind sample t-test degrees of freedom
|
n1+n2-2
|
|
2 independent sample t-test assumptions
|
-Populations of observations are normal
-sigma1^2=sigma2^2 -observations are independent |
|
AWS t'
|
-2 samples
-independent samples -Null:mu1=mu2 -sigma^2 unknown -use when n1=n2<15 or n1!=n2 |
|
AWS t' test statistic
|
(xbar1-xbar2)\
sqroot((s1^2\n1)+(s2^2\n2)) |
|
AWS t' degrees of freedom
|
n1+n2-2
|
|
AWS t' assumptions
|
-populations of observations are normal
-observations are independent |
|
2 dependent sample t
|
-2 samples
-dependent samples -x,x pairs -sigma^2 unknown -Null:Mu(sub d)=0 |
|
2 dependent sample t statistic
|
dbar-0/sqroot(s(sub-d)^2/N)
N=Number of pairs |
|
2 dependent sample t statistic degrees of freedom
|
N-1
N=#of pairs |
|
2 dependent sample t Assumptions
|
-population of d's is normal
-d's are independent |
|
3 Ways of acquiring x,x pairs for 2 dependent sample t-test
|
1)researcher produced-researcher matches on extraneous variable
2)naturally occurring-come to researcher already paired 3)repeated measures-pre & past measurements |
|
Robustness assumptions
|
1)normality-not met-robust
2)o1^2=o2^2-not met-robust if n1=n2>=15 3)independence-met-not robust |
|
Robustness Definitions
|
1)the quality of a test stat when assumption is not met
2)the sampling distribution is well-fit by the theoretical distribution 3)alpha(true)~=alpha(set)=0.05: .04<=alpha(true)<=.06 |
|
1-way ANOVA
|
-for comparing multiple samples
-J=#groups, n=#obs, N=nJ -Null:mu1=mu2=mu(sub-j) -Alternative: any difference in mu(sub-j)s |
|
Logic of the Anova
Part 1 |
1)We find 2 sample variances, one based on the x-bars, and one based on observations within groups: these 2-sample variances should estimate sigma^2 if Null is true, but estimate different quantities if alternative is true
|
|
Logic of the Anova
Part 2 |
2)We form an f-stat by putting the variance based on xbars in the numerator and variances based on observations in denominator.
If Null true, f~1. If Alternative true, f>1. |
|
ANOVA based on xbar
|
n*s(sub-xbar)^2
Null: estimates sigma^2 Alter:est. sigma^2+positive quantity |
|
ANOVA based on observations
|
s(sub-pooled)^2=sum(s(sub-j)^2)/J
Null: est. sigma^2 Alternative: est. sigma^2 |
|
ANOVA F
|
F=n*s(sub-xbar)^2/
sum(s(sub-j)^2)/J Null: expect F~1 Alternative: expect F>1 |
|
s*^2 Sampling Distribution
|
u=((N-1)/N)*sigma^2
shape: positively skewed |
|
r Sampling Distribution
|
u=P if P=0
shape: symmetric, not normal |
|
s^2 Sampling Distribution
|
u=sigma^2
shape: positively skewed |
|
MCP
|
Multiple Comparison Procedure
|
|
Pairwise Comparisons
For J means, there are ... pair wise comparisons |
C=J(J-1)/2
|
|
MCP hypotheses
|
Null: u(sub-j)=u(sub-j')
Alt: u(sub-j)!=u(sub-j') |
|
Error Rates for MCP
α'=alpha for each comp c=# pair wise comparisons |
p()=p(at last one Type I error)
--goal to keep p() small α'<=p()<=1-(1-α')^c<=c(α') |
|
Error Rate per comp. (gives good power)
|
-Set α'=.05
-p() can be large |
|
Error rate family-wise
|
Controls p() at α=.05 for c Comparisons by making α' small
|
|
Tukey MCP situation/hypothesis
|
J>=2 ind samples
H0: u(sub-j)=u(sub-j') All p.w. comparisons equal n's >=15 |
|
Tukey AND Fisher-Hayter test stat
|
t=xbar(sub-j)-xbar(sub-j')/
sqrt((MeanSquareWithin*2)/n) |
|
Fisher-Hayter MCP situation/hypothesis
|
Overall ANOVA F must be significant
J>=2 ind samples H0: u(sub-j)=u(sub-j') All p.w. comparisons equal n's >=15 |
|
Tukey and Fisher-Hayter assumptions
|
1.Pops. of obs. are normal
2.sigma(sub-j)^2=sigma(sub-j')^2 3.obs. are ind. |
|
Tukey MCP distribution
|
q/sqrt(2)
J, degrees of freedom within, and alpha=.05 |
|
Fisher-Hayter MCP
|
q/sqrt(2)
J-1, degrees of freedom within, and alpha=.05 |
|
Tukey MCP robustness
|
very similar to 2 independent sample t
|
|
2-Way ANOVA Situation/hypothesis
|
2 Factors, J=# levels of A, K=# levels of B, n obs/cell, N=nJK
A)H0:u1=u2...=uJ B)H0:u1=u2...=uK AB)H0:No Interaction Effect |
|
2-Way ANOVA test stat
|
FA=MSA/MSW=(SSA/dfA)/(SSW/dfW)
(FB and FAB, substitute B and AB for A, respectively) dfA=J-1, dfW=JK(n-1) dfB=K-1, dfAB=(J-1)(K-1) |
|
2-Way ANOVA distribution
|
FA~F(sub-(J-1,JK(n-1)))
FB~F(sub-(K-1,JK(n-1))) FAB~F(sub-((J-1)(K-1),JK(n-1))) |
|
2-Way ANOVA assumptions
|
1.Pops of obs are normal
2.equal population variances for each cell 3.obs. are independent |
|
Factor
|
a variable that classifies/groups subj.
|
|
1-Way ANOVA situation/hypothesis
|
1-factor
J>=2 independent groups H0:u1=u2=uJ H1:any difference in uJ |
|
1-Way ANOVA test statistic
|
F=MSB/MSW=(SSB/dfB)/(SSW/dfw)
dfB=J-1 dfW=N-J=J(n-1) |
|
1-Way ANOVA distribution
|
F~F(sub-(J-1),(N-J)
|
|
1-Way ANOVA assumptions
|
1.pops of obs are normal
2.sigma1^2=sigma2^2...=sigmaJ^2 3.obs are independent |
|
Levels
|
a value of a factor (One in 1-Way Anova)
|