|
What is the item difficulty index(p)?
|
indicates the percent of examinees in the sample who answerede the item correctly most situations a p=.50 is optimal except true/false tests where optimal p=.75
the closer that p=.50, the more differentiating the index is
|
|
|
What is item discrimination?
|
extent to which a test item discriminates between examinees who obtain high versus low scores on a test
|
|
|
What is the basis of classical test theory?
|
views an obtained test score as reflecting a combination of truth and error
|
|
|
What is the problem with classical test theory?
|
items are dependent upon original sample
inability to compare scores obtained on different tests
|
|
|
What is the basis of item response theory?
|
involves the use of an item characteristic curve that provides information on relationships between examinee's level on a trait measured by the test and the probability that he will respond correctly to the item
|
|
|
What are the 3 advantages of item response theory?
|
sample invariant possible to equate test scores easier to develop computer-adapted tests
|
|
|
According to classical test theory, what are the components of an examinee's obtained test score?
|
and true score (T) plus and error component (E)
obtained score (X) = Truth + Error
|
|
|
What does the error component represent in classical test theory?
|
represents measurement error which is due to factors that are irrelevant to what is being measured and have an unsystematic effect on the score
|
|
|
What is norm-referenced interpretation?
|
transform raw scores into a norm-referenced score (percentile rank, z-score, T score)
|
|
|
What is criterion referenced interpretation?
|
score interpreted in termso f total amount of test mastered (% correct) or in terms of some external criterion
|
|
|
What is reliability?
|
extent to which test performance is immune to the effects of measurement error
|
|
|
What is a reliability coefficient?
|
indicates whether the attribute measured by the test is being assessed in a consistent, precise way
|
|
|
How do you interpret a reliability coefficient?
|
the proportion of variability in obtained test scores that reflects true score variability
reliability coefficient is never squared r(xx)=true score variablity 1-r(xx)=error
|
|
|
What are the different forms of reliability?
|
test-retest (coefficient of stability) alternate forms (coefficient of equivalence) split-half (coefficient of internal consistency) coefficient alpha (coefficient of internal consistency) inter-rater reliability (coefficient of concordance)
|
|
|
What type of reliability is appropriate to measure time sampling error?
|
test-retest (coefficient of stability)
measure attributes that are relatively stable over time
|
|
|
What type of reliability is appropriate to measure time sampling and content sampling errors?
|
alternate forms (coefficient of equivalence)
not appropriate when attribute measured is expected to fluctuate over time
most rigorous and best method for estimating reliability
|
|
|
Why is alternate forms reliability often not assessed?
|
difficulty in developing forms that are truly equivalent
|
|
|
what are 2 methods for evaluating internal consistency?
|
split-half coefficient alpha
|
|
|
What is the problem with using split-half reliability?
|
reliability coefficient based on test scores from one-half of entire test
reliability tends to decrease as the length of test decreases-split half usually underestimates test's true reliability
|
|
|
How can you correct for the problems with split-half reliability?
|
use the Spearman-Brown prophecy formula-provides an estimate of what the reliability coefficient would have been if it had been based on the full length of the test
|
|
|
When do you use the Kuder-Richardson Formula 20 (KR-20)?
|
when test items are measured dichotomously
variation of coefficient alpha
not appropriate for speeded tests
|
|
|
What is a drawback of using coefficient alpha?
|
lower boundary of a test's reliability
|
|
|
What is the purpose of using coefficient alpha?
|
measure inter-item consistency
|
|
|
When is it appropriate to use inter-rater reliability?
|
whenever test scores depend on a rater's judgement
|
|
|
When is a kappa coefficient used?
|
the reliablity coefficient for inter-rater reliabliity
|
|
|
What are the factors that affect the reliability coefficient?
|
test length range of test scores guessing
|
|
|
What is the acceptable level of a reliability coefficient?
|
.80 or larger
|
|
|
What is the standard error of measurement?
|
an index of the amount of error that can be expected in obtained scores due to the unreliability of the test
calculation of the confidence interval
|
|
|
What is the formula for the standard error of measurement?
|
square root of 1-r(xx) (reliability coefficient) multipled by the standard deviation of test scores
|
|
|
What affects the magnitude of the standard error?
|
standard deviation of test scores and test's reliability coefficient lower the test's standard deviation and higher reliability coefficient = smaller standard error of measurement
|
|
|
How can you interpret the standard error of measurement?
|
type of standard deviation interpret in terms of areas under the normal curve 68%, 95%, 99% confidence intervals 1, 2, 3 standard deviations
|
|
|
What is validity?
|
test's accuracy in providing information it was designed to provide
|
|
|
What are the 3 categories of validity?
|
content validity construct validity criterion-related validity
|
|
|
What type of validity is important when scores on a test provide information on how much each examinee knows about a domain?
|
content and construct validity
|
|
|
What type of validity is important when scores on a test provide information on each examinee's status with regard to the trait being measured?
|
content and construct validity
|
|
|
What type of validity is important when scores will be used to predict scores on some other measure and you are interested in the predicted scores?
|
criterion-related validity
|
|
|
What is content validity?
|
test items sample content or behavior test was designed to measure
|
|
|
How do you establish content validity?
|
through the judgement of experts
|
|
|
What type of tests consider content validity to be important?
|
achievement-type tests work samples
|
|
|
What additional evidence supports good content validity?
|
large coefficient of internal consistency high correlations with other tests that measure the same domain pre/post test evaluations with a program designed to increase familiarity with material will show changes
|
|
|
What is construct validity?
|
the test is found to measure theoretical trait or construct designed to measure
|
|
|
What are some methods to establish construct validity?
|
assess internal consistency study group differences (adequate?) hypotheseis testing-do the scores change following the experiment assess convergent (high correlations with the same trait) and divergent (low correlations with different traits) validity assess factoral validity
|
|
|
What are monotrait-monomethod coefficients?
|
same trait-same method correlation between measure and itself reliability coefficients should be large
|
|
|
What are monotrait-heteromethod coefficients?
|
same trait-different method correlation between different measures of the same trait convergent validity
|
|
|
What are heterotrait-monomethod coefficients?
|
different trait-same method correlations between different traits measured by the same method discriminant (divergent) validity
|
|
|
What are heterotrait-heteromethod coefficients?
|
different trait-different method correlation between different traits measured by different methods discriminant validity when small
|
|
|
What do factor loadings in factor analysis measure?
|
square it to determine the amount of variability in test scores explained by the factor
|
|
|
What is communality in factor analysis?
|
common variance amount of variability in test scores that is due to the factors that the test shares in common to some degree with the other tests included in the analysis
|
|
|
From the perspective of factor analysis, what are the components of a test's reliability?
|
communality specificity error
|
|
|
What is the relationship between reliability and communality?
|
communality is a lower-limit estimate of a test's reliability coefficient
|
|