5 Steps to a 5 AP Psychology, 2010-2011 Edition (57 page)

Read 5 Steps to a 5 AP Psychology, 2010-2011 Edition Online

Authors: Laura Lincoln Maitland

Tags: #Examinations, #Psychology, #Reference, #Education & Training, #Advanced Placement Programs (Education), #General, #Examinations; Questions; Etc, #Psychology - Examinations, #Study Guides, #College Entrance Achievement Tests

BOOK: 5 Steps to a 5 AP Psychology, 2010-2011 Edition
11.58Mb size Format: txt, pdf, ePub

Types of tests

Ethics and standards in testing

Intelligence

Intelligence testing

Kinds of intelligence

Heredity/environment and intelligence

Human diversity

Standardization and Norms

Psychometrics is the measurement of mental traits, abilities, and processes.
Psychometricians
are involved in test development in order to measure some
construct
or behavior that distinguishes among people. Constructs are ideas that help summarize a group of related
phenomena or objects; they are hypothetical abstractions related to behavior and defined by groups of objects or events. For example, we can’t measure happiness, honesty, or intelligence in feet or meters. If someone tells the truth in a wide variety of situations, however, we might consider that person honest. Although we cannot observe happiness, honesty, or intelligence directly, they are useful concepts for understanding, describing, and predicting behavior. Psychological tests include tests of abilities, interests, creativity, personality, and intelligence. A good test is standardized, reliable, and valid. After many questions for a test have been written, edited, and pretested, questions are thrown out if nearly everyone answers them correctly or if very few answer them right because these types of questions do not tell us anything about individual differences. Tests that differentiate among test takers and that are composed of questions that fairly test all aspects of the behavior to be assessed are assembled. They are then administered to a sample of hundreds or thousands of people who fairly represent all of the people who are likely to take the test. This sample is used to standardize the test.
Standardization
is a two-part test development procedure that first establishes test norms from the test results of the large representative sample who initially took the test, then assures that the test is both administered and scored uniformly for all test takers.
Norms
are scores established from the test results of the representative sample, which are then used as a standard for assessing the performances of subsequent test takers; more simply, norms are standards used to compare scores of test takers. For example, the mean score for the SAT is 500 and the standard deviation is 100, whereas the mean score for the Wechsler Adult Intelligence Scale (IQ test) is 100 and the standard deviation is 15, based on the “standardization” sample. When administering a standardized test, all proctors must give the same directions and time limits and provide the same conditions as all other proctors. All scorers must use the same scoring system, applying the same standards to rate responses as all other scorers. Thus, we should earn the same test score no matter where we take the test or who scores it.

Reliability and Validity

Not only must a good test be standardized, it must also be reliable and valid.

Reliability

If a test is reliable, we should obtain the same score no matter where, when, or how many times we take it (if other variables remain the same). Several methods are used to determine if a test is reliable. In the
test-retest
method, the same exam is administered to the same group on two different occasions and the scores compared. The closer the correlation coefficient is to 1.0, the more reliable the test. The problem with this method of determining reliability or consistency is that performance on the second test may be better because test takers are already familiar with the questions. In the
split-half
method, the score on one half of the test questions is correlated with the score on the other half of the questions to see if they are consistent. One way to do that might be to compare the score of all the odd-numbered questions to the score of all the even-numbered questions. In the
alternate form method or equivalent form method
, two different versions of a test on the same material are given to the same test takers, and the scores are correlated. The SAT given on Saturday is different from the SAT given on Sunday in October; there are different questions on each form. Although this does not happen, if the same people took both exams and the tests were highly reliable, the scores should be the same on both tests. This would also necessitate high
interrater reliability
, the extent to which two or more scorers evaluate the responses in the same way.

Validity

Tests can be very reliable, but if they are not also valid, they are useless for measuring the particular construct or behavior. Psychometricians must present data to show that a test measures what it is supposed to measure accurately, and that the results can be used to make accurate decisions. Because there are no universal standards against which test scores can be compared, validation is most frequently accomplished by obtaining high correlations between the test and other assessments.
Validity
is the extent to which an instrument accurately measures or predicts what it is supposed to measure or predict. Just as there are several methods for measuring reliability, there are also several methods for measuring validity.


Face validity
is a measure of the extent to which the content of the test measures all of the knowledge or skills that are supposed to be included within the domain being tested, according to the test takers. For example, we expect the AP Psychology exam to ask between five and seven questions dealing with testing and individual differences on the multiple-choice section of the test, as defined by the content outline for the course, which sets the structure and boundaries for the content domain.


Content validity
is a measure of the extent to which the content of the test measures all of the knowledge or skills that are supposed to be included within the domain being tested, according to expert judges.


Criterion related validity
is a measure of the extent to which a test’s results correlate with other accepted measures of what is being tested.


Predictive validity
is a measure of the extent to which the test accurately forecasts a specific future result. For example, the SAT is designed to predict how well someone will succeed in his/her freshman year in college. High scores on the SAT should predict high grades for the first year in college.


Construct validity
, which some psychologists consider the true measure of validity, is the extent to which the test actually measures the hypothetical construct or behavior it is designed to assess. The MMPI-2 (described in
Chapter 14
) has a clinical trial set of questions for schizophrenia. This test has construct validity if this subset of questions successfully discriminates people with schizophrenia from other subjects taking the MMPI-2. Many people question whether intelligence tests have construct validity for measuring intelligence.

Types of Tests

Ask different psychometricians to categorize types of tests and they may give different answers, because tests can be categorized along many dimensions.

Performance, Observational, and Self-Report Tests

Psychological tests can be sorted into the three categories of performance tests, observational tests, and self-report tests. For a performance test, the test taker knows what he/she should do in response to questions or tasks on the test, and it is assumed that the test taker will do the best he/she can to succeed. Performance tests include the SATs, AP tests,
Wechsler intelligence tests, Stanford-Binet intelligence tests
, and most classroom tests, including finals, as well as computer tests and road tests for a driver’s license. Observational tests differ from performance tests in that the person being tested does not have a single, well-defined task to perform, but rather is assessed on typical behavior or performance in a specific context. Employment interviews and formal on-the-job observations for evaluation by supervisors are examples of observational tests. Self-report tests require the test taker to describe his/her feelings, attitudes, beliefs, values, opinions, physical state, or mental state
on surveys, questionnaires, or polls. The
MMPI-2
(described in
Chapter 14
) exemplifies the self-report test.

Other books

Slumbered to Death by Vanessa Gray Bartal
Fried Pickles and the Fuzz by Calico Daniels
Silence Is Golden by Mercuri, Laura
Snowflake by Suzanne Weyn
The Dud Avocado by Elaine Dundy
Sideways by Rex Pickett
The Chelsea Girl Murders by Sparkle Hayter