View the list of assessment psychology research paper topics. Read about the history of assessment psychology. Check other research paper topics for more inspiration. If you need a psychology research paper written according to all the academic standards, you can always turn to our experienced writers for help. This is how your paper can get an A! Feel free to contact our writing service for professional assistance. We offer high-quality assignments for reasonable rates.
Assessment Psychology Research Paper Topics
- The Assessment Process
- Clinical Judgment and Mechanical Prediction
- Psychometric Characteristics of Assessment Procedures
- Bias in Psychological Assessment
- Testing and Assessment in Cross-Cultural Psychology
- Psychological Assessment in Treatment
- Computerized Psychological Assessment
- Ethical Issues in Psychological Assessment
- Education and Training in Psychological Assessment
- Psychological Assessment in Adult Mental Health Settings
- Psychological Assessment in Child Mental Health Settings
- Psychological Assessment in School Settings
- Psychological Assessment in Medical Settings
- Psychological Assessment in Industrial/Organizational Settings
- Psychological Assessment in Forensic Settings
- Psychological Assessment in Correctional Settings
- Psychological Assessment in Geriatric Settings
- Assessment of Intellectual Functioning
- Assessment of Neuropsychological Functioning
- Assessment of Interests
- Assessing Personality and Psychopathology with Interviews
- Assessment of Psychopathology with Behavioral Approaches
- Assessing Personality and Psychopathology with Projective Methods
- Assessing Personality and Psychopathology with Self-Report Inventories
History of Assessment Psychology
In Act I, Scene ii of Julius Caesar, Caesar observes one of his colleagues from afar and says to Marc Antony, “Yon Cassius has a lean and hungry look; He thinks too much: such men are dangerous . . . seldom he smiles . . . such men as he never be at heart’s ease whiles they behold a greater than themselves, and therefore they are very dangerous.” In penning these words, William Shakespeare captured the essence of psychological assessment, which consists of translating observations of a person into inferences about the person’s nature and how he or she is likely to behave in various situations. In more formal terms, assessment psychology is the field of behavioral science concerned with methods of identifying similarities and differences among people in their personal characteristics, functioning capacities, and action tendencies. Assessment methods are accordingly designed to identify what people are like and how they can be expected to conduct themselves, specifically with respect to their disposition to think, feel, and act in certain ways.
Assessment Psychology Research Paper Examples:
- Assessment Process Research Paper
- Clinical Judgment and Mechanical Prediction Research Paper
- Psychometric Characteristics of Assessment Procedures Research Paper
- Bias in Psychological Assessment Research Paper
- Assessment in Cross-Cultural Psychology Research Paper
- Psychological Assessment in Treatment Research Paper
- Computerized Psychological Assessment Research Paper
- Ethical Issues in Psychological Assessment Research Paper
- Education and Training in Psychological Assessment Research Paper
- Psychological Assessment in Adult Mental Health Settings Research Paper
- Psychological Assessment in Child Mental Health Settings Research Paper
- Psychological Assessment in School Settings Research Paper
- Psychological Assessment in Medical Settings Research Paper
- Psychological Assessment in Industrial/Organizational Settings Research Paper
- Psychological Assessment in Forensic Settings Research Paper
This research paper begins by identifying the origins of assessment psychology and then traces the development of assessment methods for serving four purposes: the evaluation of intellectual ability; the identification of personality characteristics and psychopathology; the monitoring of neuropsy chological functioning; and the measurement of aptitudes, achievement, and interests. The paper concludes with comments concerning issues currently confronting assessment psychology and bearing on its future prospects.
Origins of Assessment Psychology
Over time in recorded history and for diverse reasons, methods of assessment have been used to classify, select, diagnose, advise, and plan services for people in all walks of life. Just as Caesar used observation to classify Cassius as an overly ideational and envious person not to be trusted, Gideon in a Bible story from the Book of Judges chose his troops for battle by observing how they drank water from a stream. Those soldiers who used one hand to bring water to their mouth while keeping their other hand on their weapon were chosen to fight; those who put down their weapon and used both hands to drink were sent home.
Informal decision-making procedures of this kind define the province of assessment psychology, but the transformation of such informal procedures into the standardized methodology that constitutes contemporary assessment psychology became possible only following a scientific prehistory during which the fledgling discipline of psychology gradually began to address individual differences. Scientific attention to individual differences was inspired by Charles Darwin (1859), who in The Origin of Species encouraged systematic study of how varying characteristics between species and within members of species could influence which of them survive and prosper. Intrigued by these notions of evolution and heredity, and interested particularly in the origins of human genius, Sir Francis Galton (1869, 1883) proposed that differences between people in their intellectual ability could be measured by their performance on sensorymotor tasks like reaction time, grip strength, weight discrimination, and visual acuity. Galton established a laboratory in London to study psychophysical variations in performance, and his creativity and initiative in this work led to the emergence of scientific study of human capacities. With good reason, Boring (1950, p. 487) in his History of Experimental Psychology credited Galton as being the founder of individual psychology.
Subsequent progression from individual psychology to assessment psychology came with the contribution of James McKeen Cattell (1860–1944), who as a graduate student in 1883 presented himself at Wilhelm Wundt’s laboratory in Leipzig and asked to be taken on as an assistant. The founding of Wundt’s laboratory in 1879 marks the inception of psychology as a scientific discipline, and Wundt’s goals as a scientific psychologist were to formulate universal principles of behavior that would account for response patterns common to all people. Like other behavioral scientists past and present operating with this nomothetic perspective, Wundt had little affinity for measuring differences among people, which he regarded as a troublesome error variance. Fortunately for assessment psychology, he nevertheless allowed Cattell to conduct dissertation research on individual variations in reaction time. Returning home after completing his doctorate in Leipzig, Cattell sought to extend the methods of Galton, whose laboratory he had visited briefly while lecturing at Cambridge in 1888. He did so with enormous energy and success while serving as head of the Psychology Laboratory at Columbia University from 1891 to 1917. Cattell (1890) introduced the term mental test to the psychological literature, and, during a long career that included serving as the fourth president of the American Psychological Association (APA), he pioneered mental testing and generated scientific interest in psychological tests. More than anyone else, Cattell deserves the title “father” of assessment psychology.
In the twentieth-century wake of Cattell’s generativity, the formal pursuit of methods of identifying similarities and differences among people was more often than not stirred by some practical purpose needing to be served. Assessment consequently developed as an applied rather than a basic field in psychology. Its theoretical underpinnings and the extensive research it has generated not withstanding, assessment psychology has been taught, learned, and practiced mainly as a means of facilitating decisions based in part on the needs, desires, capacities, and behavioral tendencies observed in persons being assessed.
Evaluating Intellectual Ability
The history of intellectual assessment can be traced sequentially through five developments: the emergence of the Binet scales, the construction of group-administered tests, the evolution of the Wechsler scales, the appearance of the Kaufman scales, and the quest for brief methods of measuring intelligence. The sections that follow discuss each of these instruments and describe surveys concerning the frequency with which these and other tests are used.
The Binet Scales
In 1904, the Minister of Public Instruction in Paris became concerned about the presence in public school classrooms of “mentally defective” children who could not benefit from regular instruction. The Minister’s information indicated that these “subnormal” children were detracting from the quality of the education that elementary school teachers were able to provide their other students and required special educational programs tailored to “subnormal” children’s needs and capabilities. Acting on this information necessitated some method of identifying intellectually subnormal children, which led the Minister to appoint a commission charged with developing such a method. Among those asked to serve on the commission was Alfred Binet (1875–1911), a distinguished experimental psychologist of the day well known for his interest in higher mental processes and his research on the nature of intelligence (Binet, 1903).
Binet accepted appointment to this commission and, in collaboration with physician colleague Theodore Simon (1873– 1961), designed a series of verbal and perceptual motor tasks for measuring whether students’ mental abilities fell substantially below expectation for their age.The Binet-Simon instrument debuted in 1905 (Binet & Simon, 1905), was revised in 1908 to arrange these tasks according to mental age level, and was expanded in 1911 to include adult as well as childhood levels of expectation. Word spread rapidly concerning the utility of this new instrument, which was soon translated into several English versions. The most important of these translations emerged from an extensive revision and standardization project directed by Lewis Terman (1877–1956) at Stanford University and was published in 1916 as the Stanford Revision and Extension of the Binet-Simon Intelligence Scale, soon to become known as the Stanford-Binet (Terman, 1916). Subsequent modifications and restandardization over the years produced several further versions of this measure, the most recent of which was published as the Fourth Edition Stanford-Binet in 1986 (Thorndike, Hagen, & Sattler, 1986).
Central to the conceptual basis and empirical standardization of the Stanford-Binet is a focus on normative age-related expectations for performance on its component tasks, which makes it possible to translate successes and failures on these tasks into a mental-age equivalent. While Terman was collecting his standardization data, William Stern (1871–1938) advanced the notion that a “mental quotient” could be calculated for respondents by dividing their chronological age by their mental age and multiplying the result by 100 (Stern, 1914). Terman endorsed this notion and included Stern’s calculation in the 1916 Stanford-Binet. However, hedecided to rename this number an “intelligence quotient,” introducing the term IQ into the language of psychology and into vocabularies worldwide.
Just one year after publication of the Stanford-Binet, public duty once more shaped the development of intelligence testing. The entry of the United States into World War I in 1917 generated a pressing need to draft and train a large number of young men who could quickly be transformed from city boys and farm boys into the “doughboys” who served in the trenches. It would facilitate this process to have a measure of intelligence that could be administered to large numbers of recruits at a single sitting and help screen out those whose intellectual limitations would prevent them from functioning competently in the military, while also identifying those with above average abilities who could be trained for positions of responsibility. Robert Yerkes (1877–1956), then president of the American Psychological Association, responded to the war effort by chairing a Committee on the Psychological Examination of Recruits, on which Terman was asked to serve. Coincidentally, one of Terman’s graduate students, Arthur Otis (1886–1963), had been working to develop a group intelligence test. Otis shared his work with Yerkes’ committee, which drew heavily on it to produce what came to be known as the Army Alpha test. The Army Alpha test was the first group-administered intelligence test and, as noted by Haney (1981), it was constructed quickly enough to be given to almost two million recruits by war’s end.
As a language-based instrument that required respondents to read instructions, however, the Army Alpha was not suitable for assessing recruits who were illiterate or, being recent immigrants to the United States, had little command of English. This limitation of the Army Alpha led to creation of the Army Beta, which was based on testing procedures previously developed for use with deaf persons and consisted of nonverbal tasks that could be administered through pantomime instructions, without use of language. The Army Beta’s attention to groups with special needs foreshadowed later attention to culture-related sources of bias in psychological assessment and to the importance of multicultural sensitivity in developing and using tests (see Dana, 2000; Suzuki, Ponterotto, & Meller, 2000). Following the war, group testing of intelligence continued in the form of several different measures adapted for civilian use, one of the first, fittingly enough, was the Otis Classification Test (Otis, 1923).
The Wechsler Scales
The Stanford-Binet was the first systematically formulated and standardized measure of intelligence, and for many years it was by far the most commonly used method of evaluating intelligence in young people and adults as well. The kinds of tasks designed by Binet have continued to the present day to provide the foundation on which most other tests of intelligence have been based. Beginning in the late 1930s, however, a new thread in the history of intelligence testing was woven by David Wechsler (1896–1981), then chief psychologist at Bellevue Hospital in New York City. Wechsler saw shortcomings in defining intelligence by the ratio of mental age to chronological age, especially in the evaluation of adults, and he developed instead a method of determining IQ on the basis of comparing test scores with the normative distribution of these scores among people in various age groups. The instrument he constructed borrowed subtests from the Stanford-Binet, the Army Alpha and Beta, and some other existing scales, and thus it was not new in substance. What was new was the statistical formulation of IQ as having a mean of 100 and a standard deviation of 15, which in turn led to the widely accepted convention of translating IQ scores into percentile ranks.
Also innovative was Wechsler’s belief that intellectual capacities constitute an integral feature of personality functioning, from which it followed that a well-designed intelligence test could provide useful information beyond the implications of an overall IQ score. Wechsler postulated that the pattern of relative strengths and weaknesses across subtests measuring different kinds of mental abilities could be used to identify normal and abnormal variations in numerous cognitive characteristics and coping capacities. Published as the WechslerBellevue, Wechsler’s (1939) test gradually replaced the Stanford-Binet as the most widely used measure of adult intelligence. In addition, because of the profile of subtest scoresitoffered,comparedtothesingleIQscoreormentalage equivalent available from the Stanford-Binet, the WechslerBellevue found applications in clinical health settings as a measure not only of intellectual ability but also of features of neuropsychological impairment and disordered thinking.
A revised Wechsler-Bellevue-II appeared in 1946, and three further revisions of the test were published as the Wechsler Adult Intelligence Scale (WAIS), the most recent being the WAIS-III (Wechsler, 1997). The basic format and individual subtests were also extended downward to provide versions for use with young people: the Wechsler Intelligence Scale for Children (WISC) (Wechsler, 1949), the most recent version of which is the WISC-III (Wechsler, 1991), and the Wechsler Preschool and Primary Scale of Intelligence (WPPSI) (Wechsler, 1967), with its most recent version being the WPPSI-R (Wechsler, 1989).
The Kaufman Scales
Although numerous other intelligence tests employing Binet’s mental age concept or Wechsler’s statistical approach have appeared, none has approached the visibility or popularity of these two measures. Perhaps most notable after Binet and Wechsler among intelligence test developers is Alan Kaufman, who in addition to writing extensively about the assessment of intelligence (Kaufman, 1990, 1994) developed his own general intelligence measures for children— the Kaufman Assessment Battery for Children (K-ABC) (Kaufman & Kaufman, 1983)—and for adolescents and adults—the Kaufman Adolescent and Adult Intelligence Test (KAIT; Kaufman & Kaufman, 1993). Kaufman’s measures differed in two important respects from their predecessors. First, reflecting a theoretical rather than an empirical approach, tasks were chosen not by testing how trial participants would respond to them, but by formulating certain constructs concerning the nature of intellectual functioning and using tasks that were considered likely to assess these constructs. Second, Kaufman included subtests designed to provide achievement as well as IQ scores, including assessment of abilities in reading and arithmetic.
Along with developing full-length measures, Kaufman stimulated contemporary efforts to construct brief tests of intelligence.Aquest for brief methods has long been common to all types of psychological assessment, and intelligence testing provided especially fertile ground for developing short forms of existing measures and constructing new measures that were short to begin with. The structure of Wechsler’s scales offered examiners obvious possibilities for replacing the fullWAIS or WISC with a selection of subtests they believed would be sufficient for their purposes. As reviewed by Campbell (1998) and Kaufman (1990), many such beliefs became formalized as short forms comprising from two to six subtests and achieving varying success in estimating Wechsler IQ. The most promising compromises between saving time and obtaining sufficient data have been (a) the utilization of sevensubtest short forms for the WAIS-R and the WAIS-III, which have shown correlations in the high .90s with Full Scale IQ and provide dependable estimates of Verbal and Performance IQ as well (Ryan & Ward, 1999; Ward, 1990); and (b) the selection of an eight-subtest short form of the WISC-III that yields dependable estimates of both the IQ and Index Scores calculated for this measure (Donders, 1997).
Kaufman influenced these developments by constructing a new measure, the Kaufman Brief Intelligence Test (K-BIT), which includes tasks measuring verbal facility and nonverbal reasoning and provides a composite score that can be used to estimate intellectual functioning for persons age 4 to 90 (Kaufman & Kaufman, 1990). The K-BIT became sufficiently popular among practitioners to stimulate construction of numerous other new measures consisting of a small number of traditional kinds of subtests, the most visible of these being the Wechsler Abbreviated Scale of Intelligence (WASI; Psychological Corporation, 1999) and the Wide-Range Intelligence Test (WRIT; Glutting, Adams, & Sheslow, 1999).
Frequency of Test Use
The frequency information given about the use of the Stanford-Binet and Wechsler scales derives from extensive survey data. Attention to the frequency with which various tests are used has characterized assessment psychology at least as far back as surveys conducted in 1934 and 1946 (see Loutit & Browne, 1947). Sundberg (1961) expanded on these earlier surveys with a nationwide sampling of test usage across a variety of clinical agencies and institutions, and his methodology was later repeated on a larger scale (Brown & McGuire, 1976; Lubin, Larsen, & Matarazzo, 1984; Lubin, Wallis, & Paine, 1971; and Piotrowski & Keller, 1989).
Other informative surveys have queried individual psychologists rather than agencies concerning the frequency with which they use various tests, including large samples of clinical psychologists (Archer & Newsom, 2000; Camara, Nathan, & Puente, 2000; Watkins, Campbell, Nieberding, & Hallmark, 1995), neuropsychologists (Butler, Retzlaff, & Vanderploeg, 1991; Camara et al., 2000; Lees-Haley, Smith, Williams, & Dunn, 1995), school psychologists (Kamphaus,
Petoskey, & Rowe, 2000; Stinnett, Havey, & Oehler-Stinnett, 1994; Wilson & Reschly, 1996), and forensic psychologists doing criminal evaluations (Borum & Grisso, 1995), personal injury evaluations (Boccaccini & Brodsky, 1999), and custody evaluations (Ackerman & Ackerman, 1997; LaFortune & Carpenter, 1998). Surveys have recently been undertaken outside of the United States as well, as illustrated in a report by Muñiz, Prieto, Almeida, and Bartram (1999) on test use in Spain, Portugal, and Latin American countries. Without always repeating these reference citations, subsequent comments in this research paper about test use frequency are based on the findings they report.
Identifying Personality Characteristics and Psychopathology
Standardized assessment of personality characteristics and psychopathology emerged from four separate threads of history differentiated by their distinctive procedures. A first thread involves relatively structured procedures in which respondents reply to a fixed number of specific questions by selecting their answer from a prescribed list of alternatives (e.g., Question: “Do you feel unhappy?” Answers: “Most of the time,” “Occasionally,” “Hardly ever”). Such relatively structured measures are commonly referred to as self-report methods, given that the data they provide constitute what people are able and willing to say about themselves.
A second thread consists of relatively unstructured procedures in which respondents are presented with somewhat ambiguous test stimuli and given rather vague instructions concerning what they should say about or do with these stimuli (e.g., shown a picture of a boy looking at a violin, the respondent is asked to make up a story that has a beginning and an end and includes how the boy is feeling and what he is thinking about). Measures of this kind have traditionally been called “projective” tests, because they invite respondents to attribute characteristics to test stimuli that are based on their own impressions rather than known fact (e.g., “The boy is feeling sad”) or give them considerable latitude to complete tasks in whatever manner they prefer (asked by respondents about how they should proceed on these measures, examiners typically answer with statements like “It’s up to you” or “Any way you like”).
However, most so-called projective tests have some clearly defined as well as ambiguous aspects and include specific as well as vague instructions (a violin is a violin, and “What will happen to him?” is a precise request for information). Accordingly, instead of being labeled “projective” measures, these relatively unstructured assessment instruments are probably more appropriately classified as belonging to a category of “performance-based” measures, as has been proposed by the American Psychological Association Work Group on Psychological Assessment (Kubiszyn et al., 2000; Meyer et al., 2001). By contrast with self-report data, the data obtained by performance-based measures consist not of what people say about themselves, but of the manner in which they deal with various tasks they are given to do.
A third thread in the history of methods for assessing personality characteristics and psychopathology comprises interview procedures. Assessment interviews are similar to self-report measures, in that respondents are asked directly what the assessor wants to know. Unlike relatively structured tests, however, which are typically taken in written form and involve little interaction with the examiner, interviews are interactive oral procedures in which the participants engage in a conversational exchange. Moreover, assessment interviews include a performance-based as well as a self-report component, in that interviewers typically base their impressions not only on what respondents say about themselves, but also on how they say it and how they conduct themselves while being interviewed.
The fourth thread consists of behavioral procedures that epitomize performance-based assessment. In behavioral assessment, the manner in which respondents conduct themselves is not an ancillary source of information, but instead constitutes the core data being obtained. Respondents are asked to perform tasks selected or designed to mimic certain real-world situations as closely as possible, and their performance on these tasks is taken as a representative sample of behavior that should be predictive of how they will act in the real-world situation. Gideon’s previously mentioned method of selecting his troops exemplifies assessment based on observing behavior in representative circumstances. As elaborated next, behavioral assessment, like the other three threads of personality assessment history, has a unique lineage with respect to how, why, and by whom it became established.
Relatively Structured Tests
The entry of the United States into World War I influenced assessment psychology by creating an urgent need to evaluate not only the intellectual level of draftees, as noted earlier, but their emotional stability as well. Reports from France in 1917 indicated that the war effort was being hampered by the presence in the ranks of mentally fragile soldiers who could not tolerate the psychological stress of combat. In response to these reports, Robert Woodworth (1869–1962), a prominent experimental psychologist who had done his doctoral work with Cattell and later succeeded him as department head at Columbia, designed the Personal Data Sheet (Woodworth, 1920). The Personal Data Sheet consisted of a written list of questions concerning presumed symptoms of psychological disturbance (e.g., “Are you happy most of the time?”), which were to be answered by checking “Yes” or “No.” Although intended for use as a screening device to deselect emotionally unstable draftees, Woodworth’s measure was not completed in time to serve this purpose. Following the war, however, the Personal Data Sheet was put to civilian use as a measure of adjustment, and as such it was the first formal self-report personality assessment questionnaire to become generally available.
Although limited in scope and superficial in design, Woodworth’s measure served as the model on which later generations of adjustment and personality inventories were based. Before continuing with that history, there is an historical footnote to World War I that should be noted. The development of the Personal Data Sheet as a model for an enduring tradition in assessment psychology (i.e., personality inventories), like the development of the Binet-Simon and Army Alpha before it as models of other enduring traditions (i.e., individual and group intelligence tests), bears witness to the impetus of war and public need in evoking formal methods of psychological assessment. The tides of war inevitably have their dark side, however, for those caught in the civilian crossfire as well as for those coming under military attack. In an event with broad sociopolitical implications, James McKeen Cattell, after 26 years as a senior faculty member at Columbia University was, according to Boring (1950, p. 535), dismissed from his position in 1917 after taking a pacifist stance with respect to the United States entry into World War I.
Returning to the history of self-report measures, the next majordevelopmentfollowingthePersonalDataSheetwasthe publication by Robert Bernreuter (1901–1995) of a new Personality Inventory (Bernreuter, 1931). Unlike Woodworth’s measure, which yielded just a single score for overall level of adjustment, the Bernreuter was a multidimensional selfreport instrument with separate scales for several different personality characteristics, such as neurotic tendencies, ascendance-submission, and introversion-extraversion. This was the first multidimensional personality assessment measure to appear and, although the era in which it was widely used and recognized is long past, the Bernrueter’s place in history is assured by its having set the stage for a bevy of similarly designed instruments that came to constitute a cornerstone of assessment psychology. Among these many multidimensional personality questionnaires, six currently prominent instruments are notable for illustrating different motivations and methodologies that have been involved in developing such measures: the Minnesota Multiphasic Personality Inventory (MMPI), the California Psychological Inventory (CPI), the Millon Clinical Multiaxial Inventory (MCMI), the Sixteen Personality Factors Questionnaire (16PF), the NEO Personality Inventory (NEO-PI), and the PersonalityAssessment Inventory (PAI).
The Minnesota Multiphasic Personality Inventory was constructed during the late 1930s by Starke Hathaway (1903– 1995), a psychologist, and J. Charnley McKinley, a psychiatrist, while they worked together at the University of Minnesota hospitals. Hathaway and McKinley undertook this task for the purpose of developing a group-administered pencil-and-paper measure that would assist in assigning patients to diagnostic categories. The measure they produced was first published in finished form in 1943 (Hathaway & McKinley, 1943) and has since then become the most widely used and researched of all personality assessment instruments. The manner in which Hathaway and McKinley constructed the MMPI was noteworthy for their total reliance on empirical keying in the selection of test items. Empirical keying was a radical departure from the logical keying approach that had characterized construction of the Woodworth and Bernreuter tests and other early adjustment scales and trait measures as well. In logical keying, items are selected or devised on the basis of some reasonable expectation or subjective impression that they are likely to measure a particular personality characteristic. Empirical keying, by contrast, involves selecting items according to how well in fact they differentiate among groups of people previously identified as having various psychological disorders or personality characteristics.
The original MMPI of Hathaway and McKinley was expanded over the years by the addition of many new scales and subscales, and an extensive revision and re-norming process produced the MMPI-2 (Butcher, Dahlstrom, Graham, Tellegen, & Kaemmer, 1989) and an adolescent version, the MMPI-A (Butcher et al., 1992). Having been developed with patient populations and for clinical purposes, the MMPI/ MMPI-2/MMPI-Ais generally regarded as being more suited for evaluating psychological disturbance than for elucidating normal variations in personality characteristics. Nevertheless, the instrument has proved valuable in a variety of contexts and is often used by psychologists doing forensic, neuropsychological, and personnel evaluations as well as mental health assessments.
A significant spin-off of the MMPI resulted from the efforts of Harrison Gough, who was interested less in identifying patterns of psychopathology among patients than in assessing personality characteristics in nonclinical populations. Using a combination of empirical and logical keying methods, and borrowing from the MMPI many items that were interpersonal in nature and not symptom-oriented, Gough began in 1948 to develop scales that were published as the California Psychological Inventory, currently in its third edition (Gough, 1957; Gough & Bradley, 1996). Whereas the MMPI scales had been named with diagnostic labels (e.g., depression, schizophrenia), Gough named his scales with commonly used terms that most people would be likely to recognize and understand (e.g., independence, responsibility). The essence of Gough’s purpose was captured in a review by Thorndike (1959), who referred to the CPI as “the sane man’s MMPI.”
Whereas the MMPI has been used primarily in clinical, forensic, and health care settings, the CPI has been applied mainly in counseling, educational, and organizational settings, as a way of facilitating decisions concerning career choice, academic planning, personnel selection, and the resolution of normal range adjustment problems. The CPI has also found considerable use as a research tool in studies of personality dimensions associated with achievement, leadership, and creativity.
In a mode similar to Gough’s, Theodore Millon developed the Millon Clinical Multiaxial Inventory using a combination of empirical and logical keying procedures. As a major difference from both the MCMI and the MMPI, however, Millon’s scales were derived from a comprehensive theory of personality and psychopathology that he had formulated prior to turning his attention to developing a measuring instrument (Millon, 1969). First published in 1977 (Millon, 1977), the MCMI was standardized on patients receiving mental health care and, like the MMPI, is intended for purposes of psychodiagnostic screening and clinical assessment, rather than for use with nonpatient populations. Unlike the MMPI, however, which was designed primarily to measure symptomatic concerns corresponding to Axis I disorders in the Diagnostic and Statistical Manual (DSM) of the American Psychiatric Association (2000), the MCMI is scaled mainly to reflect disorders in personality function as categorized on Axis II of the DSM. Although some symptom-related scales are included in the MCMI, and some personality disorder scales are available for the MMPI, these are not usually regarded as a strength of either, and many practitioners have found some advantage in using both instruments together in complementary fashion.
The original MCMI has been revised twice, with the current version, the MCMI-III, having been published in 1994 (Millon, 1994; see also Millon, 1996). Millon also extended his test downward to include an adolescent form, originally known as the Millon Adolescent Personality Inventory (MAPI) and currently in revised form as the Millon Adolescent Clinical Inventory (MACI) (Millon & Davis, 1993; Millon, Green, & Meagher, 1982).
As an approach to constructing self-report inventories entirely different from the empirical and logical keying that characterized the MMPI, CPI, MCMI, and their predecessors, Raymond Cattell (1905–1998; no relation to J. McK. Cattell) began in the 1940s to apply factor analytic methods to personality test construction. After drawing on a large pool of adjectives describing personality characteristics to build a long list of trait names, he obtained ratings on these traits from samples of nonpatient adults. By factor analyzing these ratings, he extracted 15 factors that he identified as “the source traits of personality.” To these 15 factors, he added a short measure of intelligence to produce the Sixteen Personality Factors Questionnaire (16PF), which was originally published in 1949 and most recently revised in 1993 (R. B. Cattell, Cattell, & Cattell, 1993).
From Cattell’s perspective, his factors captured the entire domain of trait characteristics that underlie human personality and, in common with Gough, he intended his test to serve as a measure of normal personality functioning, and not of the presence or extent of psychopathology. Nevertheless, as demonstrated by Karson and O’Dell (1989), the 16-PF can be used by practitioners to identify aspects of personality in disturbed as well as normally functioning persons.
Cattell’s factor analytic approach from the 1940s, in addition to being still visible in continued use of the 16-PF, had a contemporary renaissance in the work of Paul Costa and Robert McCrae. Like Millon, Costa and McCrae were guided in their test construction by a theoretical formulation of personality functioning, in this case the Five Factor Model (FFM), sometimes referred to as the “Big Five.” The FFM emerged from various factor analyses of personality test and rating scale data that recurrently identified four to six factors to which individual differences in personality could be attributed (see Digman, 1990). Selecting self-report items related to their preferred five-factor formulation, Costa and McCrae developed a questionnaire that yields scores along five trait dimensions, which they called “domain scales”: neuroticism, extraversion, openness, agreeableness, and conscientiousness. Their effort resulted in the 1985 publication of the NEO Personality Inventory, currently available in revised form as the NEO PI-R (Costa & McCrae, 1992).
Like the 16-PF, the NEO PI-R was intended as a measure of normal personality characteristics but has proved useful in evaluating personality problems in disturbed persons (see Piedmont, 1998). Although time has yet to tell how the NEO PR-I will eventually fare with respect to its frequency of use, there is already an extensive literature on the Five Factor Model to suggest that it will become well-established assessment instrument.
The last of these six self-report questionnaires to become well-known assessment instruments is the Personality Assessment Inventory developed by Leslie Morey (1991, 1996). The PAI is intended to provide information relevant to clinical diagnosis, treatment planning, and screening for adult psychopathology, and in this respect it is closely modeled after the MMPI. Drawing on methodology used in constructing other inventories, however, Morey formulated his scales in terms of theoretical constructs and used rational as well as quantitative criteria in selecting his items. The PAI clinical scales are primarily symptomoriented and, as in the case of the MMPI, more likely to assist in Axis I than Axis II diagnosis. In addition, however, the PAI features several scales directly related to aspects of treatment planning.
Relatively Unstructured Tests
Unlike formal tests of intelligence and self-report methods of assessing personality, which arose in response to public needs, relatively unstructured personality assessment methods came about largely as the product of intellectual curiosity. The best known and most widely used of these are the Rorschach Inkblot Method (RIM) and a variety of picturestory, figure drawing, and sentence completion methods, the most prominent of these being the Thematic Apperception Test (TAT), the Draw-a-Person (DAP), and the Rotter Incomplete Sentences Blank (RISB).
Rorschach Inkblot Method
As a schoolboy in late nineteenth-century Switzerland, Hermann Rorschach (1884–1922) was known among his classmates for his skill at a popular parlor game of the day, which consisted of making blots of ink and suggesting what they look like. Rorschach’s parlor game creativity reflected his artistic bent, because he was a talented painter and craftsman. Some of his work is permanently displayed in the Rorschach Archives and Museum in Bern, Switzerland. Later on, serving as a staff psychiatrist in a large mental hospital, Rorschach pondered whether he could learn something about his patients’ personality characteristics and adaptive difficulties by studying the perceptual style they showed in looking at inkblots. His curiosity and scientific bent led him to develop a standard series of inkblots and to collect responses to them from several hundred patients and from nonpatient respondents as well. Rorschach’s analyses of the data he obtained culminated in the 1921 publication of Psychodiagnostics (Rorschach, 1921/1942), which introduced the Rorschach Inkblot Method (RIM) in the form that the test stimuli have retained since that date.
Following Rorschach’s death at age 37, just one year after his monograph appeared, many different systems were developed both in the United States and around the world for administering, coding, and interpreting Rorschach protocols. Recognizing the potential clinical and psychometric benefit of integrating the most informative and dependable features of these various systems into a standardized procedure, John Exner (1993) developed the Rorschach Comprehensive System, which since its original publication in 1974 has become the predominant way of administering and coding this instrument. The currently most common approach to interpreting Rorschach data combines attention to respondents’perceptual style in formulating what they see in the inkblots with analyses of the thematic imagery contained in their responses and the behavioral style with which they produce these responses (see Weiner, 1998). These three data sources are then used as a basis for inferring adaptive strengths and weaknesses in how people manage stress, exercise their cognitive functions, deal with affect, view themselves, and regard other people.
Periodically issues have been raised in the literature concerning the psychometric soundness and utility of Rorschach assessment,andthismatterispresentlythesubjectofsomedebate.Withduerespectfordifferencesofopinion,however,the weight of empirical evidence documents the validity of the RIM when used appropriately for its intended purposes (Hiller, Rosenthal, Bornstein, Berry, & Brunell-Neuleib, 1999; Meyer & Archer, 2001; Rosenthal, Hiller, Bornstein, Berry,&Brunell-Neuleib,2001;Viglione&Hilsenroth,2001; Weiner, 2001), and the previously referenced surveys of test usage attest its continued widespread use in practice settings.
During the mid-1930s, Henry Murray (1893–1988), a psychoanalytically trained physician with a doctorate in biochemistry who was then serving as director of the Harvard Psychological Clinic, formulated a theory of personality that stressed the role of idiographic needs and attitudes in determining individual differences in human behavior. In collaboration with Christiana Morgan, Murray also considered the possibility of identifying needs and attitudes, especially those that people were reluctant to admit or unable to recognize, by examining the fantasies they produced when asked to tell stories about pictures they were shown. These notions led to a seminal article about picture-story methods of studying fantasy (Morgan & Murray, 1935), a classic and highly influential book called Explorations in Personality (Murray, 1938), and eventually the publication of the Thematic Apperception Test (TAT) (Murray, 1943/1971).
To the extent that the content of imagined stories can provide clues to a respondent’s inner life, TAT data are expected to shed light on the particular hierarchy of a person’s needs and the nature of his or her underlying conflicts, concerns, and interpersonal attitudes. As was the case for the inkblot method following Rorschach, Murray’s picture-story method gave rise to numerous systems of coding and interpretation. The approaches that became most commonly employed in clinical practice were variations of an “inspection technique” proposed by Leopold Bellak that consists of reading through respondents’ stories to identify repetitive themes and recurring elements that appear to fall together in meaningful ways (see Bellak & Abrams, 1997). The popularity of such a strictly qualitative and uncoded approach to TAT data has limited efforts to demonstrate the psychometric soundness of the instrument or to develop a substantial normative database for it.
On the other hand, several quantified TAT scales designed to measure specific personality characteristics for clinical or research purposes have shown that the instrument can generate reliable and valid findings when it is used in a standardized manner. Three noteworthy cases in point are scoring systems developed by McClelland, Atkinson, and their colleagues to measure needs for achievement, affiliation, and power (Atkinson & Feather, 1966; McClelland, Atkinson, Clark, & Lowell, 1953); a defense preference scale developed by Cramer (1999); and a measure of capacity for adaptive interpersonal relationships, the Social Cognition and Object Relations Scale (SCORS) developed by Westen, Lohr, Silk, Kerber, and Goodrich (1985).
The original TAT also spawned numerous extensions and spin-offs of the picture-story method intended to broaden its scope. Two variations developed by Bellak to expand the age range for respondents are the Children’s Apperception Test (CAT), which portrays animal rather than human characters in the pictures, and the Senior Apperception Test (SAT), which depicts primarily elderly people and circumstances common in the lives of older persons (see Bellak & Abrams, 1997). As an effort to enhance multicultural sensitivity, the TAT approach was used to develop the Tell-Me-A-StoryTest (TEMAS), which portrays conflict situations involving African American and Latino characters and has been found to elicit fuller responses from minority respondents than the all-Caucasian TAT pictures (Costantino, Malgady, & Rogler, 1988). Finally of note is the Roberts Apperception Test for Children (RATC), which was designed specifically to improve on the TAT and CAT as measures for use with children by portraying children and adolescents in everyday interactions, rather then either adult or animal figures; by providing an alternate set of cards showing African American young people in similar scenes; and by using a standardized scoring system (McArthur & Roberts, 1990).
Together with the emergence of specific quantifiable scores for the TAT, the publication of the RATC signaled movement in picture-story assessment toward achieving psychometric respectability, much in the manner that Exner’s Comprehensive System for Rorschach assessment moved the inkblot method in that direction. Although the TAT still lags well behind the RIM and most relatively structured assessment instruments in empirical validation, it has long been and remains one of the most frequently used methods for assessing personality functioning. Moreover, as found in a literature survey by Butcher and Rouse (1996), the volume of research articles published on the TAT in the 20-year-period from 1974 to 1994 numbered 998, which was third largest among personality measures, exceeded only by the MMPI (4,339 articles) and the Rorschach (1,969 articles).
Figure Drawing Methods
It is difficult to say who first suggested that what people choose to draw and how they draw it reveal features of their personality, whether the drawing is a prehistoric sketch found on the wall of a cave, a painting by a great master, or the doodles of an ordinary citizen. Whoever it was, it was long before Florence Goodenough (1886–1959) introduced the first formal application of figure drawings in psychological assessment in 1926. Seeking a nonverbal measure of intellectual development in children, Goodenough (1926), developed the Draw-a-Man test, in which intellectual maturity is measured by the amount of accurate detail in a young person’s drawing of a human figure. The Draw-a-Man was later revised by Harris (1963), who suggested having respondents draw pictures of a woman and of themselves, in addition to drawing a man, and expanded Goodenough’s scoring system and standardization. Most recently the Goodenough-Harris was further updated by Naglieri (1988) to include representative norms for assessing cognitive development in young people age 5 to 17.
The Draw-a-Man was adapted for purposes of personality assessment by Karen Machover (1902–1996), who in 1948 rechristened the measure as the Draw-a-Person (DAP) and introduced the notion that human figure drawings convey in symbolic ways aspects of a respondent’s underlying needs, attitudes, conflicts, and concerns. She believed that for persons of all ages and not just children, significant meaning can be attached to structural features of drawings (e.g., where figures are placed on the page) and the manner in which various parts of the body are drawn (e.g., a disproportionately large head). Whereas Machover’s approach to DAP interpretation consisted of qualitative hypotheses concerning the symbolic significance of figure drawing characteristics, subsequent developments that were focused mainly on refining this instrument for use in evaluating young people provided quantitative scoring schemes for the instrument. Notable among these were a formulation of 30 specific indicators of emotional disturbance (Koppitz, 1968) and the construction of a Screening Procedure for Emotional Disturbance (SPED; Naglieri, McNeish, & Bardos, 1991). The DAP-SPED is an actuarially derived and normatively based system comprising 55 scorable items and intended as a screening test for classifying young people age 6 to 17 with respect to their likelihood of having adjustment difficulties that call for further evaluation.
Particular interest in the assessment of young people was reflected in several other variations of Goodenough’s original method, two of which have become fairly widely used. One of these is the House-Tree-Person (HTP) test devised by Buck (1948), in which children are asked to draw a picture of a house and a tree as well as a person, in the expectation that drawings of all three objects provide symbolic representations of important aspects of a young person’s world. The other is the Kinetic Family Drawing (KFD) formalized by Burns and Kaufman (1970), in which respondents are instructed to draw a picture of their whole family, including themselves, doing something.
Also of note is a commonly used procedure suggested by Machover in which people taking any of these figure drawing tests are asked in addition to make up a story about the people they have drawn or to answer specific questions about them (e.g., “What is this person like?”). When this procedure is followed, figure drawings take on some of the characteristics of picture-story techniques, and, like picture stories, they are despite recent efforts at quantification most commonly interpreted in practice by an inspection technique in which personality characteristics are inferred primarily from subjective impressions of noteworthy or unusual features of the figures drawn. As a consequence, figure drawings remain a largely unvalidated assessment method that has remained popular despite having thus far shown limited psychometric soundness (see Handler, 1995).
Sentence Completion Methods
Sentence completion methods of assessing personality and psychopathology originated in the earliest efforts to develop tests of intelligence. Herman Ebbinghaus (1897), the pioneering figure in formal study of human memory, developed a sentence completion test for the purpose of measuring intellectual capacity and reasoning ability in children, and Binet and Simon included a version of Ebbinghaus’ sentence completion task in their original 1905 scale. Sentence completions have been retained in the Stanford-Binet, and a variety of sentence completion tasks have also found use to the present day as achievement test measures of language skills.
The extension of the sentence completion method to assess personality as well as intellectual functioning was stimulated by Carl Jung (1916), the well-known Swiss psychoanalyst and one-time close colleague of Freud who founded his own school of thought, known as “analytic psychology,” and whose writings popularized his use of a “word association” technique for studying underlying aspects of a person’s inner life. This technique was formalized in the United States by Grace Kent and Aaron Rosanoff (1910), who developed a standard 100-item list called the Free Association Test and compiled frequency tables for different kinds of responses given by a sample of 1,000 nonpatient adults.
The apparent richness of word association tasks in revealing personality characteristics suggested to many assessors that replacing the word-word format with full sentences written as completions to brief phrases (e.g., “I like . . .”; “My worst fear is . . .”) would result in an even more informative assessment instrument. Numerous sentence completion tests were constructed during the 1920s and 1930s and used for a variety of purposes, but with little systematic effort or standardization. The first carefully constructed and validated measure of this kind was developed in the late 1930s by Amanda Rohde and, like other performance-based tests of personality, was intended to “reveal latent needs, sentiments, feelings, and attitudes which subjects would be unwilling or unable to recognize or to express in direct communication” (Rohde, 1946, p. 170). The Rohde Sentence Completion Test served as a model for many similar instruments developed subsequently, and, as described by Rohde (1948), use of those that were available during the 1940s was stimulated by the impact of World War II. It has already been noted that the impetus for designing performance-based personality assessment instruments was largely intellectual curiosity rather than civilian or military needs, and such was the case with sentence completion tests. However, as a brief selfadministered measure that provided relatively unstructured assessment of personality characteristics, the sentence completion was found to be extremely helpful in evaluating and planning treatment for the vast number of psychological casualties seen in military installations during the war and cared for in its aftermath in Veterans Administrations Hospitals.
For many years, the best known and most widely used sentence completion has been the Rotter Incomplete Sentences Blank (RISB), which was developed by Julian Rotter in the late 1940s and first published in 1950, and for which adult, college, and high school forms are available (Rotter, Lah, & Rafferty, 1992). The authors provide a scoring system for the RISB that yields an overall adjustment score, but in practice the instrument is most commonly interpreted by the inspection method that characterizes the typical application of picture-story and figure-drawing instruments; that is, examiners read the content of the items and form impressions of what respondents’ completions might signify concerning their personality characteristics. Beyond published studies demonstrating modest validity of the RISB as a measure of adjustment, there has been little accumulation of empirical evidence to support inferring any specific personality characteristics from it, nor has there been much progress in documenting the reliability of RISB findings and establishing normative standards for them.
Psychological assessment is a data-gathering process that involves integrating information gleaned not only from the types of tests discussed thus far, but also from interview methods, behavioral observations, collateral reports, and historical documents. Of these, interviewing and observing people are the most widely used assessment methods for attempting to learn something about them. Although being discussed here in relation to identifying personality characteristics and psychopathology, interview methods are also commonly employed in assessing intellectual and neuropsychological functioning and aptitudes, achievement, and interests. Unlike psychological testing, interviewing is not a method uniquely practiced by psychologists, but rather an evaluative procedure employed by many different kinds of professionals for various purposes and by people in general who have some reason to assess another person, like a father interviewing a suitor for his daughter’s hand to gauge his suitability as a son-in-law.
By including both a self-report component, consisting of what people say about themselves, and a performance-based component, consisting of how they go about saying it, assessment interviews provide abundant clues to what a person is like. As a source of important assessment information, no battery of psychological tests can fully replace oral interactions between respondents and skilled interviewers, and most assessment professionals consider the interview an essential element of a psychological evaluation. In their historical development, formal interview methods emerged first in a relatively unstructured format and subsequently in relatively structured formats as well.
Relatively Unstructured Formats
More than most persons using interviews for evaluative purposes, psychologists and other mental health professionals have traditionally favored relatively unstructured interviewing methods. The popularity of unstructured inquiry can be credited to the influence of two of the most significant figures in the history of psychotherapy, Sigmund Freud (1856–1939) and Carl Rogers (1902–1987). Freud (1913/1958) recommended a free association method for conducting psychoanalytic treatment sessions that consists of instructing people to report whatever thoughts or feelings come to mind. Rogers (1942, 1951) proposed a nondirective method for conducting client-centered therapy in which the therapist’s interventions consist mainly of reflecting clients’ statements back to them. Although based on markedly different ways of conceptualizing human behavior and the psychotherapeutic process, free association and nondirective methods share in common an open-ended approach that provides minimal guidance to people concerning what or how much they should say.
Although developed for treatment purposes, free association and nondirective techniques subsequently proved valuable as well for obtaining information in assessment interviews. Even though both techniques must usually be supplemented with focused questionstoclarifyspecificpointsofinformation, they typically elicit ideas, attitudes, and recollections that would not have emerged in response to direct questioning.The psychoanalytic tradition has generated a substantial literature on psychodynamic approaches to assessment interviewing, perhaps the best known and most highly respected of which is Sullivan’s (1954) The Psychiatric Interview. Rogers’attention to the interviewing process fostered not only advances in practicebutalsonewdevelopmentsinresearch.Unliketests,which entail a test form or written protocol that remains available for future review, interviews do not produce any written record other than whatever process notes may be made during or following them. Recognizing that such notes are largely inadequate for research purposes, Rogers, while serving as Director of the Counseling Center at the University of Chicago, began making tape recordings of clinical interviews as a means of obtainingreliabledataconcerningtheirexactcontent.Intheresearch program developed by Rogers and his colleagues, tape recordings were examined for various patterns of verbal interaction between interviewer and interviewee during treatment sessions. This research on interactive processes in clinical interviews stimulated extensive studies of what became known as the “anatomy of the interview” (Matarazzo & Wiens, 1972; Pope, 1979), and Rogers’innovative work was seminal as well in fostering systematic psychotherapy research.
Because open-ended interviews require some supplementation to serve assessment purposes adequately, various formal procedures and guidelines have been inserted over time into otherwise unstructured interviews. The most notable of these is the Mental Status Examination (MSE), first proposed in 1902 by Adolf Meyer (1866–1950), a distinguished psychiatrist best known for championing a humane and “common-sense” approach to seriously disturbed persons that included thorough inquiry into their personal history and current circumstances. The MSE took form as a series of specific questions and tasks intended to provide a brief but standardized assessment of a person’s attention, memory, reasoning ability, social judgment, fund of knowledge, and orientation in time and space. As elaborated by Trzepacz and Baker (1993), a contemporary MSE also includes observations concerning a person’s general appearance, interpersonal conduct, prevailing mood, sense of reality, thought processes, self-awareness, and intellectual level.
The MSE has become a standard mental health assessment tool that is considered an integral part of diagnostic evaluations by most psychiatrists and is often used by psychologists as well, especially when they are not including any other formal tests among their procedures. Paralleling the previously mentioned interest in short forms of intelligence tests, the MSE has been particularly popular in an 11-item version developed in the 1970s as the Mini Mental Status Examination (Folstein, Folstein, & McHugh, 1975). Whatever the length of an MSE, however, the information it provides emerges in fuller and more reliable form in a psychodiagnostic test battery, and psychological assessors who are including formal testing among their evaluation procedures rarely find use for it.
Along with the development of the MSE as a semiformal addendum, relatively unstructured assessment interviews have been shaped by numerous interviewing outlines or schedules that identify topics to be covered (e.g., nature and history of presenting complaint, educational and occupational history) and specific items of information that should regularly be obtained (e.g., basic demography, current medications, and history of substance use, suicidal behavior, and physical or sexual abuse). Such interview guides have long been standard topics in interviewing textbooks for mental health professionals (e.g., Craig, 1989; Morrison, 1993; Othmer & Othmer, 1994). From a historical perspective, one of the most comprehensive and psychologically sensitive but frequently forgotten contributions of this kind was made by George Kelley (1905–1966), who is known primarily for developing personal construct theory and a personality assessment instrument he based on it, the Role Construct Repertory Test. In a classic book, The Psychology of Personal Constructs, Kelley (1955) included several chapters on conducting assessment interviews that provide excellent guidance by today’s standards as well as those of a half century ago.
Relatively Structured Formats
However rich the information obtainable from unstructured interviews, and despite the flexibility of an unstructured approach in adapting to unpredictable variations in how intervieweesmaypresentthemselves,theseformatslacksufficiently standardized procedures to ensure replicable and reliable data collection. Mounting concerns that the unreliability of diagnostic interviews in clinical settings were impeding mental health research led in the 1970s to the development of the Research Diagnostic Criteria (RDC), which comprised a set of clearly specified descriptive behavioral criteria for assigning participants in research studies to one of several diagnostic categories (Spitzer, Endicott, & Robins, 1978). This descriptive behavioral approach noticeably improved the interrater reliability achieved by diagnostic interviewers, and the RDC format, including many of its specific criteria, was subsequently incorporated into the Diagnostic and Statistical Manual (DSM) of the American Psychiatric Associations, beginning with DSM-II in 1980 and extending to the present DSM-IV-TR (American PsychiatricAssociation, 2000).
The RDC criteria also lent themselves well to formulating questions to be asked in diagnostic interviews, and they soon gave rise to a new genre of assessment methods, a relatively structured interview that consists entirely or in large part of specific items of inquiry. Simultaneously with the publication of the RDC criteria, Endicott and Spitzer (1978) introduced the best known and most frequently used instrument of this kind, the Schedule for Affective Disorders and Schizophrenia (SADS). Intended to assist in identifying a broad range of symptomatic disorders in addition to affective disorders and schizophrenia, the SADS is a semistructured interview guide that requires professional judgment and serves clinical as well as research purposes. Following on its heels came the Diagnostic Interview Schedule (DIS), which is entirely structured and was designed for use by nonprofessional interviewers in research studies (Robins, Helzer, Croughan, & Ratcliff, 1981). Both of these measures were extended downward for use with young people, as the Kiddie SADS (K-SADS; PuigAntich & Chambers, 1978) and the Diagnostic Interview for Children (DISC; Costello, Edelbrock, & Costello, 1985). The most comprehensive measure of this kind to emerge has been the Structured Clinical Interview for the DSM (SCID), which includes forms for identifying personality as well as symptomatic disorders (Spitzer, Williams, & Gibbon, 1987; see also R. Rogers, 2001).
The prescientific history of psychology aside, the formal implementation of behavioral methods for assessing personality is usually traced to the World War II activities of the United States Office of Strategic Services (OSS), the predecessor organization to the Central Intelligence Agency. Once again the winds of war instigated advances in the methods of behavioral science, just as they have in the biological and physical sciences. To aid in selecting operatives for covert intelligence missions, the OSS observed how recruits behaved in a variety of contrived problem-solving and stress-inducing situations and on this basis predicted the likely quality of their performance in the field (Office of Strategic Services Assessment Staff, 1948; see also Handler, 2001). A gap of more than 20 years followed before the OSS methods led to a clearly defined approach to assessment, mainly because the emergence of systematic behavioral assessment techniques had to await new ways of conceptualizing personality for assessment purposes.
Of many contributions to the literature that reconceptualized personality in ways that fostered the development of behavioral assessment, two can be singled out for their clarity and influence. In 1968, Walter Mischel published Personality and Assessment (Mischel, 1968), a book in which he argued that personality traits are semantic fictions, that continuity in behavior across time and place exists only as a function of similarity across situations, and that assessment of behavior should accordingly focus on its situational determinants. A few years later, Goldfried and Kent (1972) drew a sharp distinction between “traditional” and “behavioral” assessment procedures with respect to how personality is viewed. From a traditional assessment perspective, these authors pointed out, personality consists of characteristics that lead people to behave in certain ways, and understanding a person’s actions is a product of examining his or her underlying tendencies or dispositions. From a behavioral perspective, by contrast, personality “is defined according to the likelihood of an individual manifesting certain behavioral tendencies in the variety of situations that comprise his day-to-day living” (Goldfried & Kent, 1972, p. 412). Behaviorally speaking, then, personality is not an a priori set of concrete action tendencies that people have and carry around with them, but is rather a convenient abstraction for summarizing after the fact how people have been observed to interact with their environment.
These innovative conceptions of personality, echoed in numerous other books and articles, led during the 1970s and 1980s to a dramatic growth of interest in developing assessment methods in which the obtained data would consist of representative samples of behavior that could be objectively evaluated for their implications after the fact, as contrasted with test responses to be interpreted inferentially as signs of underlying states or traits they are presumed before the fact to measure. The core techniques used to achieve this purpose of behavioral assessment included (a) observational ratings of person’s responses in natural and contrived situations, as suggested by the OSS methods and by situations devised by Paul (1966) to assess the effectiveness of systematic desensitization; (b) observed conduct in role-playing exercises, based on procedures developed by Rotter and Wickens (1948); (c) self-report instruments focused on specific behavioral interactions, as had earlier been exemplified by measures like Geer’s (1965) Fear Survey Schedule; (d) psychophysiological measurements, which were suggested by the successful employment of such techniques in the then emerging field of behavioral medicine research (see Kallman & Feuerstein, 1977); and (e) behavioral interviews specifically focused on how people respond to certain kinds of situations in their lives.
The late 1980s saw gradual moderation of the original conceptual underpinnings of behavioral assessment and considerable broadening of its focus. It is currently widely recognized that people are not as “trait-less” as Mischel argued, nor are traditional and behavioral methods of assessment as distinct and mutually exclusive as Goldfried and Kent originally suggested. In the case of Mischel’s argument, behavioral assessors rediscovered Lewin’s classic maxim that how people behave is an interactive function of their dispositional nature and the environmental circumstances in which they find themselves, and the advent of cognitive perspectives in behavioral approaches encouraged behavioral assessors to attend to what people are thinking and feeling as well what they are doing. As for the Goldfried and Kent distinction, behavioral assessors recognized that they could extend the practical applications of their approach by supplementing behavioral observations with judicious utilization of clinical judgment. As reflected in the behavioral assessment literature that ushered in the 1990s, strictly behavioral methods became appreciated as having some limitations, and traditional methods as having some strengths; correspondingly, behavioral assessment evolved into a multifaceted process comprising a broader range of techniques and levels of evaluation than had been its legacy (see Bellack & Hersen, 1988; Ciminero, Calhoun, & Adams, 1986; Haynes & O’Brien, 2000).
Monitoring Neuropsychological Functioning
As summarized by Boll (1983), neuropsychology emerged both as a discipline and as an area of professional practice. As a discipline, neuropsychology is the field of science concerned with the study of relationships between brain functions and behavior. As applied practice, neuropsychology consists primarily of using various assessment procedures to measure the development and decline of brain functions and their impairment as a consequence of head injury, cerebrovascular accidents (stroke), neoplastic disease (tumors), and other illnesses affecting the central nervous system, of which Alzheimer’s disease is the most prevalent. The historical highlights of formal neuropsychological assessment cluster around the development of the Bender Visual Motor Gestalt Test and the subsequent emergence of neuropsychological test batteries.
Best known among the earliest formal psychological assessment methods constructed to measure brain functions was the Bender Visual Motor Gestalt Test, first described by Lauretta Bender (1897–1987) in 1938 (Bender, 1938). Historical lore has it that Bender, then a psychiatrist at Bellevue Hospital in New York, became intrigued by psychomotor differences she observed among children as they made chalk drawings on the city sidewalks in preparation for playing hopscotch. She noted that some of the children were more skillful than others in executing these drawings. By and large, older children were better at it than younger ones, but some older children appeared to have persistent difficulty in drawing the hopscotch designs accurately. These observations led Bender to conclude that Gestalt principles of visual organization and perception, as reflected in the drawing of designs, could be applied to identifying individual differences in maturation and detecting forms of organic brain disease and psychopathology. Selecting for her test nine designs that had been developed by Wertheimer, she presented in her 1938 text illustrations of how these designs were likely to be copied by normally developing children age 4 to 11 and by normal, brain-damaged, and emotionally disturbed adults.
The Bender Gestalt test has fared both well and poorly since 1946, when the stimulus cards were first published separately from Bender’s book and made generally available for professional use. Among important refinements of the test, Pascal and Suttell (1951) developed an extensive scoring system for identifying brain dysfunction in adults, and Koppitz (1975) undertook a large standardization study in the 1960s to construct a scoring scheme that would measure both cognitive maturation and neuropsychological impairment in children. Lacks (1998) later proposed a simplified 12-item criterion list that has proved fairly accurate in differentiating brain-damaged from neuropsychologically intact adults. The Bender Gestalt also became and has remained very popular among assessment psychologists as a screening device for brain dysfunction in adults and for developmental delay in young people. In the recent test use surveys mentioned previously, this instrument was ranked fifth in frequency of use among samples of clinical psychologists (Camara et al., 2000) and experienced professionals conducting child custody evaluations (Ackerman & Ackerman, 1997), and seventh among forensic examiners experienced in neuropsychology (Lees-Haley et al., 1995).
On the other hand, with respect to its faring poorly, the Bender was reported as being used by only 27% of sampled members of the International Neuropsychological Society (Butler et al., 1991), and a sample of the NationalAcademy of Neuropsychologists membership ranked the Bender 25th in frequency among the measures they use (Camara et al., 2000). The apparent disrepute of the Bender among mainstream neuropsychologists, despite its extensive research base, may have several origins. These include (a) its having been developed prior to the emergence of neuropsychological assessment as a well-defined practice specialty, which began in the 1950s; (b) its having typically been interpreted by practitioners on the basis of their subjective impressions rather than one of the available scoring systems for it; and (c) its frequently having been given more credence than was warranted as a definitive and stand-alone indicator of cognitive insufficiency or brain dysfunction. Particularly relevant in this last regard is the fact that, although the Bender provides useful information concerning aspects of visual organization and perceptual-motor coordination, it does not encompass the broad range of cognitive processes that constitute neuropsychological functioning. Sufficiently broad measurement to warrant neuropsychological inferences awaited the development of test batteries designed for this purpose.
Neuropsychological Test Batteries
The inception of broadly based and multifaceted test batteries for assessing neuropsychological functioning can be credited to the efforts of Ward Halstead (1908–1969), who in 1935 established a laboratory at the University of Chicago for the purpose of studying the effects of brain damage. Halstead’s observations convinced him that brain damage produces a wide range of cognitive, perceptual, and sensorimotor deficits that cannot be identified by any single psychological test. He accordingly devised numerous tasks for measuring various aspects of cerebral functioning. In subsequent collaboration with one of his graduate students, Ralph Reitan, he gradually reduced the number of these tasks to seven for which empirically determined cutoff scores showed good promise for distinguishing normal from impaired brain functioning. This set of tasks became formalized as the HalsteadReitan Neuropsychological Test Battery (HRB) in the 1950s and continues to have a major place in neuropsychological assessment (see Reitan & Wolfson, 1993). Developed originally with adults, the HRB was later extended downward for children age 9 to 15 (Halstead Neuropsychological Test Battery for Children and Allied Procedures) and age 5 to 9 (Reitan-Indiana Neuropsychological Test Battery for Children).
Theprimarilyquantitativeapproachtoneuropsychological assessment represented by the HRB stimulated considerable research and attracted to assessment practice a substantial contingent of brain-behavior scientists who might not otherwise have become directly involved in clinical work. Also exerting a lasting influence on assessment methods was a qualitative approach to identifying neuropsychological impairment, which stemmed from the work of Alexander Luria (1902–1977) in the Soviet Union. Luria believed that more could be learned from behavioral features of how people deal with test materials than from the scores they earn, and he accordingly emphasized measures designed to maximize opportunities for respondents to demonstrate various kinds of behavior he considered relevant in diagnosing brain dysfunction.
In Luria’s approach, conclusions are based less on psychometricdatathanonanexaminer’sobservationsandinferences. Although Luria’s testing methods and his theoretical formulation of functional systems in the brain date from the 1930s, it was not until his work was first translated into English in the 1960s that his seminal contributions to neuropsychology first became widely appreciated. The initial organization of his procedures into a formal test manual was published in the 1970s (Christensen, 1975), and further standardization and validation of his measures during the 1980s resulted in publication of the Luria-Nebraska Neuropsychological Battery (LNNB; Golden, Purisch, & Hammeke, 1985).
The face of neuropsychological assessment and the uses to which it is put have gradually changed since the early work that led to the Halstead-Reitan and Luria-Nebraska batteries. Consistent with the underlying premise of both batteries that identification of brain dysfunction requires assessment of a range of cognitive functions, many specifically focused measures of concept formation, memory, psychomotor, language, and other related capacities were designed for use instead of or as supplements to these batteries. The specific measures most commonly used by contemporary neuropsychologists include theWechsler Memory Scale, the Boston NamingTest, the Verbal Fluency Test, the Wisconsin Card Sorting Test, the California Verbal Learning Test, the Rey-Osterreith Complex Figure Test, the Stroop Neuropsychological Screening Test, and two components of the HRB, the Finger Tapping Test and the Trail Making Test (Butler et al., 1991; Camara et al., 2000; for further information concerning these and other neuropsychological assessment instruments, see Lezak, 1995; Spreen & Strauss, 1998).
Along with benefiting from the availability of increasingly refined measures, neuropsychological examiners began as early as the 1950s to move beyond what had been their original focus in applied practice, which was helping to determine whether a patient’s complaints were “functional” in nature (i.e., psychologically determined) or “organic” (i.e., resulting from central nervous system dysfunction). Instead of inferring from test data merely the likelihood of a patient’s having a brain lesion, skilled neuropsychologists became proficient in identifying which side of the brain and which lobe were likely to contain the lesion. Over time, however, the development of sophisticated radiographic techniques for determining the presence, location, and laterality of brain damage rendered neuropsychological tests all but superfluous for this purpose, except as screening measures. Concurrently, on the other hand, contemporary neuropsychological assessment became increasingly valuable in professional practice by reverting to the purpose Halstead originally had in mind back in the 1930s: namely, evaluating an individual’s strengths and weaknesses across a broad range of perceptual, cognitive, language, and sensorimotor functions.
With its current focus on the measurement of functioning capacities, neuropsychological assessment provides useful information concerning what people can be expected to do in educational, occupational, and other everyday life activities. Armed with this information, psychologists and the people to whom they consult can predict degrees of success and failure in these activities, identify what kinds of skill improvements are needed to enhance success level, and propose types of intervention or training that will be likely to enhance these deficient skills in the particular person being evaluated. In addition to basing performance predictions and treatment plans on the nature and extent of functioning deficits associated with brain damage from whatever source, neuropsychological examiners can use retesting data to monitor changes in functioning capacity over time. Refined measures of neuropsychological functioning can help to assess the rate and amount of declining capacity in conditions that involve progressive deterioration, and they can likewise quantify the pace of progress in persons recovering from brain disease or injury. Neuropsychological assessment has consequently become common practice in diverse applied settings ranging from forensic consultation to rehabilitation planning.
Measuring Achievement, Aptitudes, and Interests
As noted in previous sections of the paper, intellectual and personality assessment emerged largely out of a perceived necessity for administrators to make decisions about people, specifically with respect to their educational requirements and their eligibility for military service. By contrast, methods of assessing achievement, aptitudes, and interests were developed primarily to help people make decisions about themselves. To be sure, measures of what a person is able to do or is interested in doing can be used to determine class placement in the schools or personnel selection in organizations. More commonly, however, these measures have been used to help people plan their educational and vocational future on the basis of what appear to be their abilities and interests.
Early formulations identified tests of achievement as ways of measuring the effects of learning, as distinguished from “native ability” that was independent of learning and measured by aptitude and intelligence tests. There remains a general consensus that aptitude tests serve to predict a person’s potential for improved performance following education or training in some endeavor, whereas achievement tests serve to evaluate the performance level attained at a particular point in time. It is also widely agreed, however, that “aptitude test” scores are influenced by learning and life experience as well as inborn talents, and that “achievement test” scores identify future potential as well as present accomplishment. Accordingly, what respondents display on both kinds of tests is the extent to which they have developed certain kinds of abilities, and little purpose is served by rigid distinctions between these types of measures (see Anastasi & Urbina, 1997, chap. 17). With this in mind, the discussion that follows traces briefly the development of four measures of achievement/aptitude and interest that have deep roots in the history of assessment psychology and enjoy continued widespread use: the Wide-Range Achievement Test, the Strong Interest Inventory, the Kuder Occupational Interest Survey, and the Holland Self-Directed Search.
Wide-Range Achievement Test
In the United States, formal achievement testing began in the schools during the early 1920s. Tests of specific competencies (e.g., spelling) had been developed prior to that time, but group-administered batteries for assessing a broad range of academic skills began with the 1923 publication of the Stanford Achievement Test (SAT), which was designed for use with elementary school students. This was followed in 1925 by the Iowa High School Content Examination, later called the Iowa Test of Basic Skills, designed for use with older students. Contemporary versions of the Stanford and Iowa scholastic achievement measures remain widely used for group testing in elementary and secondary schools.
Individual assessment of academic skills can be traced to the late 1930s, when Joseph Jastak (1901–1979), then at Columbia University, became acquainted with David Wechsler’s work on developing scales for the Wechsler-Bellevue. Jastak came to the conclusion that fully adequate assessment of cognitive functioning required supplementing Wechsler’s scales with some measures of basic learning skills, especially reading, writing, and calculating. To this end, he began constructing measures that involved recognition and pronunciation of words, a written spelling test, and a written arithmetic test. An instrument comprising these three measures was published as the Wide-RangeAchievement Test (WRAT) in 1946 (Jastak, 1946). Later versions of this instrument, consisting of essentially the same reading, spelling, and arithmetic tests as the original, have appeared as the WRAT-R (Jastak & Wilkinson (1984) and theWRAT3 (Wilkinson, 1993).
In common with most of the other measures discussed in this research paper, the WRAT has been remarkable for its longevity and widespread use. Its normative data make it applicable for age 5 through adulthood, and it has become a standard assessment tool not only in academic settings but in clinical and neuropsychological practice. The previously cited survey of test usage by Camara et al. (2000) show the WRAT as the seventh most frequently used test by clinical psychologists and ninth most frequently used test by neuropsychologists.
Strong Interest Inventory
During the academic year 1919–1920, E. K. Strong Jr. (1884–1963) attended a graduate seminar on interest measurement while attending the Carnegie Institute of Technology. What he learned in this seminar peaked his curiosity about whether interests could be measured in ways that would predict what kinds of occupations a person would find enjoyable. In pursuit of this goal, Strong first developed a list of statements about various activities that test respondents could endorse as something they liked or disliked to do. He then keyed these statements to different occupations on the basis of how people employed in these occupations responded to them. This latter procedure introduced empirical keying methodology to interest measurement, just as Hathaway and McKinley would later introduce it to personality measurement in constructing the MMPI. Several years of developmental work resulted in the publication of the Strong Vocational Interest Blank (SVIB) (Strong, 1927). For persons taking this test, the results provided direct information concerning the extent to which their patterns of interests were similar to or different from those of people working as lawyers, teachers, production managers, and the like.
Like other self-report inventories that have found an enduring place in assessment psychology, the SVIB has been extensively revised since its original publication. The number of occupations in its empirical base has been increased substantially, its initially strictly empirical approach to interpreting the implications of its scale scores has been amplified by theoretical perspectives on the classification of occupational interests, and its name has evolved into the Strong Interest Inventory (SII) (Hansen & Campbell, 1985; Harmon, Hansen, Borgen, & Hammer, 1994). Stable since its inception, however, has been the status of Strong’s instrument as the most frequently used among all interest inventories.
Kuder Occupational Interest Survey
Frederic Kuder (1903–2000) set about measuring occupational interests differently from Strong in two respects. First, instead of presenting individual items to be endorsed as “like” or “dislike,” he constructed groups of three alternative activities and asked respondents to indicate which of each triad they would most prefer to do. Second, instead of scoring respondents’ preferences for their relevance to specific occupations, he developed scales for relating them to general areas of interest, including Outdoor, Mechanical, Computational, Scientific, Persuasive, Artistic, Literary, Musical, Social Service, and Clerical. Ameasure embodying these characteristics was published as the Kuder Personal Preference Record (Kuder, 1939) with scales for seven areas of interest. As an alternative to the Strong, the Kuder pointed less directly to specific occupations that respondents should consider but provided more information about personal characteristics that would be likely to have a bearing on whether they would enjoy certain kinds of work.
Kuder’s measure was expanded in subsequent revisions to feature 20 broad interest areas, a downward extension for use with elementary and high school students, and its current name, the Kuder Occupational Interest Survey (KOIS; Kuder & Zytowski, 1991). Paralleling the evolution of the SVIB from a strictly occupationally scaled measure to one that incorporates as well a theoretically based classification of occupational interests, the KOIS now includes some occupational as well as basic interest scores.
Holland Self-Directed Search
Like Strong and Kuder before him, John Holland began his work on measuring vocational interests as an empiricist, concerned with collecting data on likes, dislikes, and pBibliography: that would have predictive value for successful occupational choice. Early on, however, he opted for a rational-empirical approach to scale construction in which variables are selected on the basis of some guiding concepts and empirical testing with criterion groups is employed only secondarily to refine and revise item content. Holland’s guiding concepts were rooted in his belief that occupational pBibliography: derive from a person’s self-concept and personality style, and the first product of his approach was the Vocational Preference Inventory (VPI; Holland, 1953). The VPI yielded scale scores related to broad aspects of personality styles or attitudes, and in subsequent revisions the core VPI scales evolved into the following six: Realistic (R), Investigative (I), Artistic (A), Social (S), Enterprising (E), and Conventional (C) (Holland, 1985). Some additional empirically derived scales were added to the instrument, but the RIASEC group became the model on which Holland elaborated an influential personality-based theory of career choice and satisfaction (Holland, 1966). Holland postulated that every individual’s personality comprises some combination of these six styles, and he maintained that the extent to which each style is present provides a personality description that has direct implications for career planning.
Holland later used this model to design the Self-Directed Search (SDS), which generates scale scores for the RIASEC components and offers suggestions concerning the kinds of occupations for which persons with various scale combinations might find themselves suitable (Holland, 1979; Holland, Fritzsche, & Powell, 1994). A unique feature of the SDS is a manual that instructs respondents not only in how to self-administer the test but also in how to interpret the results for themselves. Although in actual practice SDS results are typically reviewed with an assessment professional, the selfinterpretation guidelines have the advantage of enriching a respondent’s engagement in and understanding of a vocational counseling process.
Having opened with the words of one English author, this research paper can fittingly close with the words of another: “It was the best of times; it was the worst of times,” wrote Charles Dickens in beginning A Tale of Two Cities. Assessment psychology has arrived at the best and worst of times following a long and distinguished history. As has been noted, the roots of scientific and professional interest in assessing individual differences reach almost as far back as the inception of psychology as a science and preceded its initial applications in applied practice. Advances in assessment methods were psychology’s main way of responding to public and national needs during the first half of the twentieth century, and applied psychology was largely defined during this time by assessment conducted in clinical, educational, and organizational settings. Students interested in practicing or studying aspects of applied psychology were routinely trained in assessment methods of various kinds, and being a competent assessor was generally considered an integral part of being a competent psychological practitioner.
Applied psychology and the place of assessment in it changed dramatically during the second half of the twentieth century. Practicing psychologists embraced many new roles as therapists and consultants, and their primary work settings evolved from a narrow range of institutions into a broad panoply of attractive opportunities in independent practice and in forensic, health care, governmental, and other agencies that came to appreciate the knowledge and skills that psychologists can bring to bear. Consonant with these new directions in practice, assessment came to play a lesser part than before in what applied psychologists did, and many practitioners chose not to include assessment among the services they offered.
Despite reducing the predominance of assessment, however, these practice changes did not bring bad times with them. To the contrary, the beginning of the twenty-first century is in many respects the best of times for assessment psychology, which more than ever before is a progressive, dynamic, intriguing, challenging, and potentially rewarding field of scientific and professional endeavor. A recent survey by the American Psychological Association Practice Directorate has indicated that, after psychotherapy, assessment is the second most frequent service provided by psychologists across various practice settings. Respondents to this survey working in independent practice or in health care or government settings reported spending 15% to 23% of their time doing assessment, and there appears to be a stable cadre of persons in both academic and practice positions who identify themselves primarily as assessment psychologists (Phelps, Eisman, & Kohout, 1998). Organizations like the Society for Personality Assessment with more that 2,500 members and the National Academy of Neuropsychologists with more than 3,000 members are flourishing, as are practice specialties in which assessment plays a central role, including not only neuropsychology but forensic psychology and school psychology as well.
The thriving test publishing business bears further witness to widespread use of many different kinds of assessment methods. There is a steady stream of new instruments, revisions of older instruments, updated normative reference data, and advances in computer-based test interpretation with which assessment psychologists must keep current. Competence in assessment cannot be maintained by employing yesterday’s methods; only by incorporating rapidly emerging improvements in assessment methods can practitioners meet ethical standards for competent practice (see Weiner, 1989).
The present-day vigor of assessment psychology is reflected not only in its applications but in a burgeoning literature as well. There are more quality journals, textbooks, and handbooks concerned with assessment available now than at any time in the past. The subscriber-selected journals presently abstracted in the American Psychological Association’s PsycSCAN: Clinical Psychology include in alphabetical order Assessment, Journal of Clinical and Experimental Neuropsychology, Journal of Clinical Neuropsychology, Journal of Personality Assessment, and Psychological Assessment, and also widely referenced are the journals Archives of Clinical Neuropsychology, Behavioral Assessment, and Journal of Behavioral Assessment. The literature includes an international array of publications as well (e.g., the European Journal of Psychological Assessment, official organ of the European Association of Psychological Assessment, and the International Journal of Testing, official organ of the International Test Commission), and published research findings are constantly expanding knowledge concerning the psychometric foundations of psychological assessment methods and the benefits that derive from their appropriate use. Noteworthy in this latter regard are detailed reports by the previously mentioned American Psychological Association Psychological Assessment Work Group that document the validity of a broad range of assessment methods and their utility in clinical health care and other applied settings (Kubiszyn et al., 2000; Meyer et al., 2001).
And yet these are also trying times for assessment, due primarily to negative forces operating from outside psychology and from within our own ranks as well. From the outside, psychological assessment practice has been buffeted by the priorities placed by managed care agencies on delivering health services in the quickest and least expensive way possible. Such priorities severely restrict support for complex and time-consuming evaluation procedures conducted by doctoral level professionals. In common with other health care professionals specializing in evaluation procedures, assessment psychologists doing primarily clinical work have had their practices curtailed by the advent of managed care, and there has in recent years been some decline in the frequency with which comprehensive multimethod assessments using full-length measures are conducted (Eisman et al., 2000; Piotrowski, 1999; Piotrowski, Belter, & Keller, 1998).
Within psychology’s ranks, contemporary trends in graduate education have compromised the caliber of assessment training provided in many psychology programs. Striving to achieve breadth and diversity in a crowded curriculum, graduate faculty have been prone to undervalue assessment skills, to disregard the unique significance of assessment for psychology’s professional identity, and to consider internship centers responsible for assessment training. These attitudes have been reflected in reduced course offerings and decreased requirements in assessment, sometimes consisting of little more than exposure to the mechanics of a few selected tests, without hands-on experience in integrating assessment data collected from multiple sources into carefully crafted written reports. Recent surveys of internship directors identify considerable dissatisfaction on their part with the assessment training students are receiving in many graduate programs, and they report that the majority of graduate students arriving at their centers come poorly prepared to conduct evaluations (see Clemence & Handler, 2001; Stedman, Hatch, & Schoenfeld, 2000).
What lies ahead for assessment psychology? Although definitely wounded by managed care, the field does not appear to have sustained any life-threatening injuries. Although health maintenance organizations have posed a distinct threat to the viability of comprehensive assessment and disrupted the professional lives of many psychologists, there is reason to believe that both quality assessment and its practitioners are succeeding in weathering this storm.
Of greater concern than managed care is the matter of how and where the next generation of potential researchers and practitioners will be trained in assessment psychology. No matter how well-intended, the argument that assessment training belongs in internships rather than in graduate programs poses a more serious threat to the future of assessment psychology than issues of how fees for service will be paid. Taking assessment out of the graduate curriculum separates it from its academic base and discourages students from becoming involved in or enthusiastic about assessment-related research. Relegating assessment training to the internship— which means in many cases that the internship center must provide basic instruction in assessment methods before interns can even begin to conduct comprehensive evaluations—restricts the time available for students to develop even minimal competence as assessors. A further argument sometimes heard, that assessment competence is a specialized skill to be acquired by interested students in postdoctoral programs or workshops, is even more ill-advised. Assessment being learned mainly as a postdoctoral specialty would divorce the field even further from its research base and subtract it even further from the core content of psychology with which graduate students are made familiar.
Needed now and in the years ahead, then, to perpetuate the scientific and professional advancement of assessment psychology, is enlightened orchestration of graduate education. Graduate programs should be carefully crafted to acquaint students with the nature of assessment psychology and its place in psychology’s history; to provide opportunities for students to become involved in assessment research and to gain appreciation for the practical value of good assessment; and, for students in applied areas, to include pre-internship experience in conducting multimethod psychological evaluations and integrating the data obtained from them. Only then will assessment psychology be able in the future as in the past to contribute to expanded understanding of human behavior and the delivery of helpful psychological services.
- Ackerman, M. J., & Ackerman, M. C. (1997). Custody evaluation practices: A survey of experienced professionals (revisited). Professional Psychology, 28, 137–145.
- American Psychiatric Association. (2000). Diagnostic and statistical manual of mental disorders (4th ed., text rev.). Washington, DC: Author.
- Anastasi, A., & Urbina, S. (1997). Psychological testing (7th ed.). Upper Saddle River, NJ: Prentice-Hall.
- Archer, R. P., & Newsom, C. R. (2000). Psychological test usage with adolescent clients: A survey. Assessment, 7, 227–235.
- Atkinson, J. W., & Feather, N. T. (1966). A theory of achievement motivation. New York: Wiley.
- Bellack, A. S., & Hersen, M. (Eds.). (1988). Behavioral assessment: A practical handbook (3rd ed.). New York: Pergamon Press.
- Bellak, L., & Abrams, D. M. (1997). The T.A.T., C.A.T., & S.A.T. in clinical use (6th ed.). Boston: Allyn & Bacon.
- Bender, L. (1938). A visual motor Gestalt test and its clinical uses (Research Monographs No. 3). New York: American Orthopsychiatric Association.
- Bernreuter, R. G. (1931). The Personality Inventory. Palo Alto, CA: Consulting Psychologists Press.
- Binet, A. (1903). L’etude expérimental de l’intelligence [The experimental study of intelligence]. Paris: Schleicher.
- Binet, , & Simon, T. (1905). Methodes nouvelles pour le diagnostic du niveau intellectual des anormaux [New methods for the diagnosisofabnormalintellectuallevel].L’AnneePsychologique, 11, 193–244.
- Boccaccini, M. T., & Brodsky, S. L. (1999). Diagnostic test usage by forensic psychologists in emotional injury cases. Professional Psychology, 30, 253–259.
- Boll, T. J. (1983). Neuropsychological assessment. In I. B. Weiner (Ed.), Clinical methods in psychology (2nd ed., pp. 282–330). New York: Wiley.
- Boring, E. G. (1950). A history of experimental psychology. New York: Appleton-Century-Crofts.
- Borum, R., & Grisso, T. (1995). Psychological test use in criminal forensic evaluations. Professional Psychology, 26, 465–473.
- Brown, W. R., & McGuire, J. M. (1976). Current psychological assessment practices. Professional Psychology, 7, 475–484.
- Buck, J. N. (1948). The H-T-P technique: A qualitative and quantitative method. Journal of Clinical Psychology, 4, 317–396.
- Burns, R. C., & Kaufman, S. H. (1970). Kinetic Family Drawings (K-F-D): An introduction to understanding children through kinetic drawings. New York: Brunner/Mazel.
- Butcher, J. N., Dahlstrom, W. G., Graham, J. R., Tellegen, A., & Kaemmer, B. (1989). Minnesota Multiphasic Personality Inventory (MMPI-2): Manual for administration and scoring. Minneapolis: University of Minnesota Press.
- Butcher, J. N., & Rouse, S. V. (1996). Personality: Individual differences and clinical assessment. Annual Review of Psychology, 47, 87–111.
- Butcher, J. N., Williams, C. L., Graham, J. R., Archer, R., Tellegen, A., Ben-Porath, Y. S., et al. (1992). MMPI-A manual for administration, scoring, and interpretation. Minneapolis: University of Minnesota Press.
- Butler, M., Retzlaff, P., & Vanderploeg, R. (1991). Neuropsychological test usage. Professional Psychology, 22, 510–512.
- Camara, W. J., Nathan, J. S., & Puente, A. E. (2000). Psychological test usage: Implications in professional psychology. Professional Psychology, 31, 141–154.
- Campbell, J. M. (1998). Internal and external validity of seven Wechsler Intelligence Scale for Children: 3rd ed. short forms in a sample of psychiatric inpatients. Psychological Assessment, 10, 431–434.
- Cattell, J. M. (1890). Mental tests and measurements. Mind, 15, 373–380.
- Cattell, R. B., Cattell, A. K., & Cattell, H. E. (1993). Sixteen Personality Factors Questionnaire (5th ed.). Champaign, IL: Institute for Personality and Abilities Testing.
- Christensen,A. L. (1975). Luria’s neuropsychological investigation: Text, manual, and test cards. New York: Spectrum.
- Ciminero, A. R., Calhoun, K. S., & Adams, H. E. (Eds.). (1986). Handbook of behavioral assessment (2nd ed.). New York: Wiley.
- Clemence, A. J., & Handler, L. (2001). Psychological assessment on internships: A survey of training directors and their expectations for students. Journal of Personality Assessment, 76, 18–47.
- Costa, P. T., Jr., & McCrae, R. R. (1992). Revised NEO Personality Inventory (NEO PI–R) and NEO Five-Factor Inventory (NEOFFI). Odessa, FL: Psychological Assessment Resources.
- Costantino, G., Malgady, R. G., & Rogler, L. H. (1988). TEMAS (Tell-Me-A-Story) manual. Los Angeles: Western Psychological Services.
- Costello, E. J., Edelbrock, C. S., & Costello, A. J. (1985). Validity of the NIMH Diagnostic Interview Schedule for Children: A comparison between psychiatric and pediatric referrals. Journal of Abnormal Child Psychology, 13, 579–595.
- Craig, R. J. (Ed.). (1989). Clinical and diagnostic interviewing. Northvale, NJ: Aronson.
- Cramer, P. (1999). Future directions for the Thematic Apperception Test. Journal of Personality Assessment, 72, 74–92.
- Dana, R. H. (Ed.). (2000). Handbook of cross-cultural and multicultural personality assessment. Mahwah, NJ: Erlbaum.
- Darwin, C. (1859). The origin of species. London: Murray.
- Digman, J. M. (1990). Personality structure: Emergence of the FiveFactor Model. Annual Review of Psychology, 41, 417–440.
- Donders, J. (1997). A short form of the WISC-III for clinical use. Psychological Assessment, 9, 15–20.
- Ebbinghaus, H. (1897). Über eine neue methode zur prüfung geistiger fähigkeiten and ihre anwendung bei schulkindern [On a new method for the testing of intellectual capacity and its application by school children]. Zeitschrift fur Psychologie und Physiologie der Sinnesorgane, 13, 451–457.
- Eisman, E. J., Dies, R. R., Finn, S. E., Eyde, L. D., Kay, G. G., Kubiszyn, T. W., et al. (2000). Problems and limitations in using psychological assessment in the contemporary health care delivery system. Professional Psychology, 31, 131–140.
- Endicott, J., & Spitzer, R. L. (1978). A diagnostic interview: The schedule for affective disorders and schizophrenia. Archives of General Psychiatry, 35, 837–844.
- Exner, J. E., Jr. (1993). The Rorschach: A comprehensive system. I. Basic foundations (3rd ed.). New York: Wiley.
- Folstein, M. F., Folstein, S. E., & McHugh, P. R. (1975). Minimental state. Journal of Psychiatric Research, 12, 189–198.
- Freud, S. (1958). On beginning the treatment (further recommendations on the technique of psychoanalysis-I). In J. Strachey (Ed. & Trans.), The standard edition of the complete psychological works of Sigmund Freud (Vol. 12, pp. 123–144). London: Hogarth Press. (Original work published 1913)
- Galton, F. (1869). Hereditary genius: An inquiry into its laws and consequences. London: Macmillan.
- Galton, F. (1883). Inquiry into human faculty and its development. London: Macmillan.
- Geer, J. H. (1965). The development of a scale to measure fear. Behaviour Research and Therapy, 3, 45–53.
- Glutting, J., Adams, W., & Sheslow, D. (1999). Wide Range Intelligence Test. Wilmington, DE: Wide Range.
- Golden, C. J., Purisch, A. D., & Hammeke, T. A. (1985). LuriaNebraska Neuropsychological Battery: Forms I and II Manual. Los Angeles: Western Psychological Services.
- Goldfried, M. R., & Kent, R. N. (1972). Traditional vs. behavioral assessment: A comparison of methodological and theoretical assumptions. Psychological Bulletin, 77, 409–420.
- Goodenough, F. L. (1926). Measurement of intelligence by drawings. New York: Harcourt, Brace & World.
- Gough, H. G. (1957). California Psychological Inventory manual. Palo Alto, CA: Consulting Psychologists Press.
- Gough, H. G., & Bradley, P. (1996). CPI manual (3rd ed.). Palo Alto, CA: Consulting Psychologists Press.
- Handler, L. (1995). The clinical use of drawings. In C. Newmark (Ed.), Major psychological assessment instruments (pp. 206– 293). Boston: Allyn & Bacon.
- Handler, L. (2001). Assessment of men: Personality assessment goes to war by the Office of Strategic Services assessment staff. Journal of Personality Assessment, 76, 558–578.
- Haney, W. (1981). Validity, vaudeville, and values: A short history of social concerns over standardized testing. American Psychologist, 36, 1021–1034.
- Hansen, J. C., & Campbell, D. P. (1985). Manual for the SVIB–SII (4th ed.). Stanford, CA: Stanford University Press.
- Harmon, L. W., Hansen, J. C., Borgen, F. H., & Hammer, A. L. (1994). Strong Interest Inventory: Applications and technical guide. Palo Alto, CA: Consulting Psychologists Press.
- Harris, D. B. (1963). Children’s drawings as a measure of intellectual maturity. New York: Harcourt, Brace & World.
- Hathaway, S. R., & McKinley, J. C. (1943). The Minnesota Multiphasic Personality Inventory manual. New York: Psychological Corporation.
- Haynes, S. N., & O’Brien, W. O. (2000). Principles of behavioral assessment. New York: Kluwer Academic/Plenum.
- Hiller, J. B., Rosenthal, R., Bornstein, R. F., Berry, D. T. R., & Brunell-Neuleib, S. (1999). A comparative meta-analysis of Rorschach and MMPI validity. Psychological Assessment, 11, 278–296.
- Holland, J. L. (1953). Manual for the Vocational Preference Inventory. Palo Alto, CA: Consulting Psychologists Press.
- Holland, J. L. (1966). The psychology of vocational choice. Waltham, MA: Blaisdell.
- Holland, J. L. (1979). The Self-Directed Search professional manual. Palo Alto, CA: Consulting Psychologists Press.
- Holland, J. L. (1985). Vocational Preference Inventory: Professional manual. Odessa, FL: Psychological Assessment Resources.
- Holland, J. L., Fritzsche, B. A., & Powell, A. B. (1994). The SelfDirected Search (SDS) Technical manual. Odessa, FL: Psychological Assessment Resources.
- Jastak, J. (1946). Wide Range Achievement Test. Wilmington, DE: C. L. Story.
- Jastak, J., & Wilkinson, G. (1984). Wide Range Achievement TestRevised. Wilmington, DE: Jastak Associates.
- Jung, C. G. (1916). The association method. American Journal of Psychology, 21, 219–269.
- Kallman, W. M., & Feuerstein, M. (1977). Psychophysiological procedures. In A. R. Ciminero, K. S. Calhoun, & H. E. Adams (Ed.), Handbook of behavioral assessment (pp. 329–366). New York: Wiley.
- Kamphaus, R. W., Petoskey, M. D., & Rowe, E. W. (2000). Current trends in psychological testing of children. Professional Psychology, 31, 155–164.
- Karson, S., & O’Dell, J. W. (1989). The 16 PF. In C. S. Newmark (Ed.), Major psychological assessment instruments (Vol. 2, pp. 45–66). Boston: Allyn & Bacon.
- Kaufman, A. S. (1990). Assessing adolescent and adult intelligence. Boston: Allyn & Bacon.
- Kaufman, A. S. (1994). Intelligent testing with the WISC-III. New York: Wiley.
- Kaufman, A. S., & Kaufman, N. L. (1983). Interpretive manual for the Kaufman Assessment Battery for Children. Circle Pines, MN: American Guidance Service.
- Kaufman, A. S., & Kaufman, N. L. (1990). Kaufman Brief Intelligence Test. Circle Pines, MN: American Guidance Service.
- Kaufman, A. S., & Kaufman, N. L. (1993). Interpretive manual for the Kaufman Adolescent and Adult Intelligence Test. Circle Pines, MN: American Guidance Service.
- Kelley, G. A. (1955). The psychology of personal constructs. New York: Norton.
- Kent, G. H., & Rosanoff, A. (1910). A study of association in insanity. American Journal of Insanity, 67, 37–96, 317–390.
- Koppitz, E. M. (1968). Psychological evaluation of children’s human figure drawings. New York: Grune & Stratton.
- Koppitz, E. M. (1975). The Bender Gestalt Test for young children. 2: Research and applications. New York: Grune & Stratton.
- Kubiszyn, T. W., Finn, S. E., Kay, G. G., Dies, R. R., Meyer, G. J., Eyde, L. D., et al. (2000). Empirical support for psychological assessments in clinical health care settings. Professional Psychology, 31, 119–130.
- Kuder, G. F. (1939). Kuder Preference Record: Form A. Chicago: University of Chicago Press.
- Kuder, G. F., & Zytowski, D. G. (1991). Kuder Occupational Interest Survey Form DD: General manual (3rd ed.). Monterey, CA: CTB Macmillan/McGraw-Hill.
- Lacks, P. (1998). Bender Gestalt screening for brain dysfunction (2nd ed.). New York: Wiley.
- LaFortune, K. A., & Carpenter, B. N. (1998). Custody evaluations: A survey of mental health professionals. Behavioral Sciences and the Law, 16, 207–224.
- Lees-Haley, P. R., Smith, H. H., Williams, C. W., & Dunn, J. T. (1995). Forensic neuropsychological test usage: An empirical survey. Archives of Clinical Neuropsychology, 11, 45–51.
- Lezak, M. D. (1995). Neuropsychological assessment (3rd ed.). New York: Oxford University Press.
- Loutit, C. M., & Browne, C. G. (1947). Psychometric instruments in psychological clinics. Journal of Consulting Psychology, 11, 49–54.
- Lubin, B., Larsen, R. M., & Matarazzo, J. D. (1984). Patterns of psychological test usage in the United States: 1935–1982. American Psychologist, 39, 451–454.
- Lubin, B., Wallis, R. R., & Paine, C. (1971). Patterns of psychological test usage in the United States: 1935–1969. Professional Psychology, 2, 70–74.
- Matarazzo, J. D., & Wiens, A. N. (1972). The interview: Research on its anatomy and structure. Chicago: Aldine-Atherton.
- McArthur, D. S., & Roberts, G. E. (1990). Roberts Apperception Test for Children manual. Los Angeles: Western Psychological Services.
- McClelland, D. C., Atkinson, J. W., Clark, R. A., & Lowell, E. L. (1953). The achievement motive. New York: Appleton-CenturyCrofts.
- Meyer, G. J., & Archer, R. P. (2001). The hard science of Rorschach research: What do we know and where do we go? Psychological Assessment, 13, 486–562.
- Meyer, G. J., Finn, S. E., Eyde, L. D., Kay, G. G., Moreland, K. L., Dies, R. R., et al. (2001). Psychological testing and psychological assessment: A review of evidence and issues. American Psychologist, 56, 128–165.
- Millon, T. (1969). Modern psychopathology: A biosocial approach to maladaptive learning and functioning.Philadelphia: Saunders.
- Millon,T.(1977).Minneapolis, MN: National Computer Systems.
- Millon, T. (1994). Manual for the MCMI–III. Minneapolis, MN: National Computer Systems.
- Millon, T. (1996). The Millon inventories. New York: Guilford Press.
- Millon, T., & Davis, R. (1993). Millon Adolescent Clinical Inventory (MACI). Minneapolis, MN: National Computer Systems.
- Millon, T., Green, C. J., & Meagher, R. B., Jr. (1982). Millon Adolescent Personality Inventory manual. Minneapolis, MN: National Computer Systems.
- Mischel, W. (1968). Personality land assessment. New York: Wiley.
- Morey, L. C. (1991). The Personality Assessment Inventory professionalmanual.Odessa,FL:PsychologicalAssessmentResources.
- Morey, L. C. (1996). An interpretive guide to the Personality Assessment Inventory (PAI). Odessa, FL: Psychological Assessment Resources.
- Morgan, C. D., & Murray, H. A. (1935). A method for investigating fantasies. Archives of Neurology and Psychiatry, 34, 389– 406.
- Morrison, J. (1993). The first interview: A guide for clinicians. New York: Guilford Press.
- Muñiz, J., Prieto, G., Almeida, L., & Bartram, D. (1999). Test use in Spain, Portugal, and Latin American countries. European Journal of Psychological Assessment, 15, 151–157.
- Murray, H. A. (1938). Explorations in personality. New York: Oxford University Press.
- Murray, H. A. (1971). Thematic Apperception Test manual. Cambridge, MA: Harvard University Press. (Original work published 1943)
- Naglieri, J. A. (1988). Draw-a-Person: A quantitative scoring system. New York: Psychological Corporation.
- Naglieri, J. A., McNeish, T. J., & Bardos, A. N. (1991). Draw-aPerson: Screening procedure for emotional disturbance. Austin, TX: ProEd.
- Office of Strategic Services Assessment Staff. (1948). Assessment of men. New York: Rinehart.
- Othmer, E., & Othmer, S. C. (1994). The clinical interview using DSM-IV. Vol. 1: Fundamentals. Washington, DC: American Psychiatric Association.
- Otis, J. A. (1923). Otis Classification Test. Yonkers, NY: World.
- Pascal, G. R., & Suttell, B. J. (1951). The Bender Gestalt Test: Quantification and validity for adults. New York: Grune & Stratton.
- Paul, G. L. (1966). Insight vs. desensitization in psychotherapy. Stanford, CA: Stanford University Press.
- Phelps, R., Eisman, E. J., & Kohout, J. (1998). Psychological practice and managed care: Results of the CAPP practitioner survey. Professional Psychology, 29, 31–36.
- Piedmont, R. L. (1998). The Revised NEO Personality Inventory: Clinical and research applications. New York: Plenum Press.
- Piotrowski, C. (1999). Assessment practices in the era of managed care: Current status and future directions. Journal of Clinical Psychology, 55, 787–796.
- Piotrowski, C., Belter, R. W., & Keller, J. W. (1998). The impact of “managed care” on the practice of psychological testing: Preliminary findings. Journal of Personality Assessment, 70, 441–447.
- Piotrowski, C., & Keller, J. W. (1989). Psychological testing in outpatient mental health facilities: A national study. Professional Psychology, 20, 423–425.
- Pope, B. (1979). The mental health interview: Research and application. New York: Pergamon Press.
- Psychological Corporation. (1999). Wechsler Abbreviated Scale of Intelligence. San Antonio, TX: Author.
- Puig-Antich, J., & Chambers, W. (1978). The schedule for affective disorders and schizophrenia for school age children. New York: New York State Psychiatric Institute.
- Reitan, R. M., & Wolfson, D. (1993). The Halstead-Reitan Neuropsychological Test Battery: Theory and clinical interpretation. Tucson, AZ: Neuropsychology Press.
- Robins, L. N., Helzer, J. E., Croughan, J. L., & Ratcliff, K. S. (1981). National Institute of Health Diagnostic Interview Schedule. Archives of General Psychiatry, 38, 381–389.
- Rogers, C. R. (1942). Counseling and psychotherapy. Boston: Houghton Mifflin.
- Rogers, C. R. (1951). Client-centered therapy. Boston: Houghton Mifflin.
- Rogers, R. (2001). Handbook of diagnostic and structure interviewing. New York: Guilford Press.
- Rohde, A. R. (1946). Explorations in personality by the sentence completion method. Journal of Applied Psychology, 30, 169– 181.
- Rohde, A. R. (1948). A note regarding the use of the sentence completion test in military installations since the beginning of World War II. Journal of Consulting Psychology, 12, 190–193.
- Rorschach, H. (1942). Psychodiagnostics: A diagnostic test based on perception. New York: Grune & Stratton. (Original work published 1921)
- Rosenthal,R.,Hiller,J.B.,Bornstein,R.F.,Berry,D.T.R.,&BrunellNeuleib,S.(2001).Meta-analyticmethods,theRorschach,andthe MMPI. Psychological Assessment, 13, 449–551.
- Rotter, J. B., Lah, M. I., & Rafferty, J. E. (1992). Manual: Rotter Incomplete Sentences Blank (2nd ed.). Orlando, FL: Psychological Corporation.
- Rotter, J. B., & Wickens, D. D. (1948). The consistency and generality of ratings of social aggressiveness made from observations of role playing situations. Journal of Consulting Psychology, 12, 234–239.
- Ryan, J. J., & Ward, L. C. (1999). Validity, reliability, and standard errors of measurement for two seven-subtest forms of the Wechsler Adult Intelligence Scale–III. Psychological Assessment, 11, 207–211.
- Spitzer, R. L., Endicott, J., & Robins, E. (1978). Research diagnostic criteria: Rationale and reliability. Archives of General Psychiatry, 35, 773–782.
- Spitzer, R. L., Williams, J. B. W., & Gibbon, M. (1987). Structured clinical interview for DSM-III-R. New York: New York State Psychiatric Institute.
- Spreen, O., & Strauss, E. (1998). A compendium of neuropsychological tests (2nd ed.). New York: Oxford.
- Stedman, J. M., Hatch, J. P., & Schoenfeld, L. S. (2000). Preinternship preparation in psychological testing and psychotherapy: What internship directors say they expect. Professional Psychology, 31, 321–326.
- Stern, W. (1914). The psychological methods of testing intelligence. Baltimore: Warwick & York.
- Stinnett, T. A., Havey, J. M., & Oehler-Stinnett, J. (1994). Current test usage by practicing school psychologists: A national survey. Journal of Psychoeducational Assessment, 12, 331–350.
- Strong, E. K., Jr. (1927). Vocational Interest Blank. Stanford, CA: Stanford University Press.
- Sullivan, H. S. (1954). The psychiatric interview. New York: Norton.
- Sundberg, N. D. (1961). The practice of psychological testing in clinical services in the United States. American Psychologist, 16, 79–83.
- Suzuki, L. A., Ponterotto, J. G., & Meller, P. J. (Eds.). (2000). The handbook of multicultural assessment (2nd ed.). New York: Wiley.
- Terman, L. M. (1916). The measurement of intelligence. Boston: Houghton Mifflin.
- Thorndike, R. L. (1959). The California Psychological Inventory: A review. In O. K. Buros (Ed.), Fifth mental measurements yearbook (pp. 742–744). Highland Park, NJ: Gryphon Press.
- Thorndike, R. L., Hagen, E. P., & Sattler, J. M. (1986). StanfordBinet Intelligence Scale: Guide for administering and scoring the 4th ed. Chicago: Riverside.
- Trzepacz, P. T., & Baker, R. W. (1993). The psychiatric mental status examination. New York: Oxford University Press.
- Viglione, D. J., & Hilsenroth, M. J. (2001). The Rorschach: Facts, fictions, and future. Psychological Assessment, 13, 452–471.
- Ward, L. C. (1990). Prediction of Verbal, Performance, and Full Scale IQs from seven subtests of the WAIS–R. Journal of Clinical Psychology, 46, 436–440.
- Watkins, C. E., Jr., Campbell, V. L., Nieberding, R., & Hallmark, R. (1995). Contemporary practice of psychological assessment by clinical psychologists. Professional Psychology, 26, 54–60.
- Wechsler, D. (1939). Measurement of adult intelligence. Baltimore: Williams & Wilkins.
- Wechsler, D. (1949). Manual for the Wechsler Intelligence Scale for Children. New York: Psychological Corporation.
- Wechsler, D. (1967). Manual for the Wechsler Preschool and Primary Scale of Intelligence. New York: Psychological Corporation.
- Wechsler, D. (1989). Manual for the Wechsler Preschool and Primary Scale of Intelligence–Revised. New York: Psychological Corporation.
- Wechsler, D. (1991). Manual for the Wechsler Intelligence Scale for Children (3rd ed.). New York: Psychological Corporation.
- Wechsler, D. (1997). WAIS–III administration and scoring manual. San Antonio, TX: Psychological Corporation.
- Weiner, I. B. (1989). On competence and ethicality. Journal of Personality Assessment, 53, 827–831.
- Weiner, I. B. (1998). Principles of Rorschach interpretation. Mahwah, NJ: Erlbaum.
- Weiner, I. B. (2001). Advancing the science of psychological assessment: The Rorschach Inkblot Method as exemplar. Psychological Assessment, 13, 423–432.
- Westen, D., Lohr, N., Silk, K., Kerber, K., & Goodrich, S. (1985). Object relations and social cognition TAT scoring manual. Ann Arbor: University of Michigan.
- Wilkinson, G. S. (1993). Wide Range Achievement Test–3. Wilmington, DE: Wide Range.
- Wilson, M. S., & Reschly, D. J. (1996). Assessment in school psychology training and practice. School Psychology Review, 25, 9–23.
- Woodworth, R. S. (1920). Personal data sheet. Chicago: