Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
77 Cards in this Set
- Front
- Back
issue with guessing
|
it leads to problems in understandingwhat one’s true test score - especially on achievement tests |
|
Abbott's formula for blind guessing |
R(correctresponses) W (wrong responses) - K (number of alternatives)237 |
|
To overcome the influence of blind guessing - one should advise examinees to |
attempt every question |
|
Items that are clear in multiple choice formats may be confusing in |
short answer formats |
|
Accordingto Ebel abetter way to increase test reliability is |
to add more items |
|
The bestway to calculate reliability for speeded tests is to |
do a split-half reliability on the test |
|
Halo Effect |
rater’stendency to perceive an individual who is high (or low) in one areas is alsohigh (or low) in other areas |
|
general-impression model |
tendency of rater to allow overall impressions of an individual influence judgment of a person’s performance(e.g.person may rate reporter as “impressive” and thus, also rate his/her speech as strong as well) |
|
Salient Dimension model |
When the rating of one quality affects the rating of another independent quality (e.g.people rated as attractive are also rated as more honest) |
|
SimpsonParadox |
aggregatingdata can change the meaning of the data - can obscure the conclusions becauseof a third variable/span>6jG |
|
In terms of minority hiring - minorities applied to two levels of positionsclerical and executive - overall hiring rats found that only 11% (110/1010) ofminority group were hired as compared to 14%(85/600) of majority group. What is this scenario an example of? |
the Simpson Paradox |
|
Thereis a debate that whether our clinical judgment is superior to |
mechanical judgment |
|
mechanical judgment |
statistical predictions or predictions based on some type of quantative indice |
|
Marital relationship satisfaction wasdetermined based on higher sex versus argument ratios - people tend to raterelationships higher if have more sex and less fights This is an example of what kind of mechanical decision making? |
crude |
|
Mechanicalor quantitative prediction can only work when |
people highlight what variables to examine to determine prediction |
|
People are not as good, in terms of prediction, as |
integrating the data in unbiased ways |
|
Our belief in prediction if reinforced by the |
isolated incidents we can access |
|
Factor Analysis |
a statistical tool that is used to mathematically determine which items are associated with various latent constructs |
|
Factor analysis requires that one come up with |
number of items |
|
Steps in factor analysis |
1. sample items on 200-500 subjects 2. input how the sample rated each item 3. run factor analysis and then look at the pattern of where items load and then name the factor |
|
When doing item development for factor analysis, you need to have ___________ items because they give you greater ability to tap into multiple aspects of the construct |
more |
|
Facets |
defined-homogenousitem clusters that directly map onto the larger order factors |
|
Dichotomous item response formats cannot be used for factor analysis because |
it can cause a serious disturbance in the correlation matrix |
|
When utilizing factor analysis, more response items mean generate a greater amount of |
variance
|
|
For well defined factors, you can use a sample size of _____________ for factor analysis |
100-200 |
|
If factors are not well defined you may need a sample size of up to _________ for factor analysis |
500 |
|
4 Reasons for Conducting Factor Analysis |
1. Developing and Identifying Hierarchical Factor Structure 2. Improving Psychometric Properties of a Test 3. Developing Items that Discriminate between Samples 4. Developing more unique items |
|
All tests with sound items should have a strong |
internal consistency |
|
Factor analysis can help developers determine items to remove, revise or add in order to improve |
internal consistency |
|
2 Primary Objections to Short Form Development |
1. Rigorous and comprehensive evaluation is crucial and short form cannot give the level of information that is required for an appropriate assessment 2. Short forms are often developed without careful and thorough examination of the new form's validity |
|
2 General Problems for Short Forms |
1. Assumption that all the reliability and validity of the long form automatically applies to the abbreviated form 2 Assumption that the new shorter measure requires less validity evidence |
|
7 Problems in Regard to Empirical Evidence for Short Forms |
1. Researchers found that if large measure does not have good validity, neither will the short one! 2 Found that by reducing the items the content coverage may be compromised- very few short form designers performed content domain checks 3. Found significant reduction in reliability coefficients 4. found that many times researchers do not run another factor analysis on the short form to see if the same factor structure is present. 5. Need to administer short form to an independent sample to determine validity- not use the sample that long form was developed on 6. need to use the short form to classify clinical populations and compare if it is as accurate as the long form 7. need to establish if there is genuine time and money savings with a short form |
|
Item Analysis |
general term for a set of methods used to evaluate test items |
|
2 Types of Item Analysis |
Item Difficulty vs. Item Discriminability |
|
Item Difficulty |
defined by the number of people who get a particular item correct |
|
Item difficulty should usually fall between |
.3 and .7 |
|
When developing item difficulty, you need to consider whom |
you are testing (like medical students vs. disabled students) |
|
test floor |
sufficient amount of easy items |
|
test ceiling |
sufficient amount of hard items |
|
ItemDiscriminability |
determineswhether the people who have done well on a particular item have also done wellon the entire test |
|
Extremegroup method for Item Discriminability |
comparespeople who have done very well with those who have done very poorly on a test |
|
discrimination index in extreme group method |
proportion of people in each group
|
|
Item Difficulty Formula |
U + M + L |
|
Item Discrimination Formula |
U - L |
|
2 Methods of Item Discriminability |
1) Extreme Group Method 2) Point Biserial Method |
|
Point Biserial Method for Item Discriminability |
find the correlation between the performance on the item and compare it with the enire test |
|
Item Response Theory (IRT) is a collection of mathematical models and statistical models that do these 3 things: |
1. analyze items and scales 2. measure psychological constructs 3. compare individuals on psychological constructs |
|
The basic unit of IRT is |
item response function |
|
item response function is a mathematical function describing |
therelation between where an individuals falls on the continuum of a givenconstruct such as depression and the probability that the he/she will give aparticular response to a scale item designed to measure that construct |
|
In IRT, a construct is called a |
latent variable |
|
in terms of item difficulty, the higher the number, the |
easier the question |
|
Point biserial ranges from |
-1 to +1 |
|
A positive point biserial tells us that |
the item discrimnates well because those that scored higher on the test also got the questions correct |
|
The closer a point biserial is to +1, the more _______________________ it has |
discrimination power |
|
Discrimination power means that |
it does well at discriminating between upper and lower ranges |
|
A negative point biserial generally indicates that people in the higher scoring ranges got the item _________, as compared to those in the lower scoring range. |
wrong, |
|
A negative point biserial means that there is something wrong with |
your question, but we don't know what.
|
|
Classical Testing Theory or CTT is limited by |
only 2 sources of error- random and systematic |
|
True Score Model from Classical Testing Theory |
X (Observed Score) = T (True Score) + E (Error) |
|
Random Error |
fluctuations in the measurement based purely on chance |
|
Systematic Error |
error that affects a score because some particular characteristic of the person or the test that has nothing to do with the construct being measured |
|
CTT recognizes only two sources of variance, and cannot adequately estimate |
individual sources of error influencing a measurement |
|
Generalizability Theory acknowledges that |
multiple factors may affect the error associated with measurement of one’s true score |
|
Generalizability Theory allows researchers to estimate the total variance or error in terms of |
individualfactors that vary in terms of the assessment, setting, time, items andraters |
|
Dependability |
is the testers score dependable across a myriad of conditions |
|
Reliabilityis dependent on |
theinferences (generalizations) that the investigator wishes to make with the datafrom the measurement |
|
2 Types of Error Analyses |
1. G-Study 2. D-Study |
|
G- Study (Generalizability Study) |
Toprovide as much information as possible about the sources of variation in themeasurement |
|
D-Study |
usesG-Study information to evaluate the effectiveness of alternative designs forminimizing error and maximizing reliability. |
|
Generalizabilitycoefficient |
A reliable measure is one where the observed value closely estimates the expectedscore over all acceptable observations |
|
dependability coefficient |
how dependable are the measures from one judge to the next |
|
Reliability Formula |
X= T + E Variance of T / Variance of X or Variance of T / Var T + Var E |
|
Item difficulty is synonymous with severity. Which means the |
more severe the person's diagnosis is, the more likely they will be to endorse that item. |
|
*Item difficulty is indicated by the curve that is furthest away |
from the Y axis |
|
Item discrimination is determined by the steepness of |
the slope |
|
Generalizability coeffcients are from |
.8-1.0- good generalizability .6-.8- marginal generalizability <.6- poor generalizability |
|
The biggest advantage of iIRT over CCT is that |
you can map differential severity patterns for each item. You can look at individual items and look at differential scoring patterns to determine levels of severity as well as discriminability, independent of test bias, which can prevent a clinician from overpathologizing. |