View sample assessment process research paper. Browse other research paper examples and check the list of psychology research paper topics for more inspiration. If you need a psychology research paper written according to all the academic standards, you can always turn to our experienced writers for help. This is how your paper can get an A! Feel free to contact our custom writing service for professional assistance. We offer high-quality assignments for reasonable rates.
Assessment psychology is the field of behavioral science concerned with methods of identifying similarities and differences among people in their personal characteristics and capacities. As such, psychological assessment comprises a variety of procedures that are employed in diverse ways to achieve numerous purposes. Assessment has sometimes been equated with testing, but the assessment process goes beyond merely giving tests. Psychological assessment involves integrating information gleaned not only from test protocols, but also from interview responses, behavioral observations, collateral reports, and historical documents. The Standards for Educational and Psychological Testing (American Educational Research Association [AERA], American Psychological Association, and National Council on Measurement in Education, 1999) specify in this regard that
the use of tests provides one method of collecting information within the larger framework of a psychological assessment of an individual. . . A psychological assessment is a comprehensive examination undertaken to answer specific questions about a client’s psychological functioning during a particular time interval or to predict a client’s psychological functioning in the future. (p. 119)
The diverse ways in which assessment procedures are employed include many alternative approaches to obtaining and combining information from different sources, and the numerous purposes that assessment serves arise in response to a broad range of referral questions raised in such companion fields as clinical, educational, health, forensic, and industrial/ organizational psychology.
This research paper sets the stage by conceptualizing assessment as a three-stage process comprising an initial phase of information input, a subsequent phase of information evaluation, and a final phase of information output. Information input involves collecting assessment data of appropriate kinds and in sufficient amounts to address referral questions in meaningful and useful ways. Information evaluation consists of interpreting assessment data in a manner that provides accurate descriptions of respondents’ psychological characteristics and behavioral tendencies. Information output calls for utilizing descriptions of respondents to formulate conclusions and recommendations that help to answer referral questions. Each of these phases of the assessment process requires assessors to accomplish some distinctive tasks, and each involves choices and decisions that touch on critical issues in conducting psychological assessments.
Collecting Assessment Information
The process of collecting assessment information begins with a formulation of the purposes that the assessment is intended to serve. A clear sense of why an assessment is being conducted helps examiners select tests and other sources of information that will provide an adequate basis for arriving at useful conclusions and recommendations. Additionally helpful in planning the data collection process is attention to several examiner, respondent, and data management issues that influence the nature and utility of whatever findings are obtained.
Psychological assessments are instigated by referrals that pose questions about aspects of a person’s psychological functioning or likely future behavior. When clearly stated and psychologically relevant, referral questions guide psychologists in determining what kinds of assessment data to collect, what considerations to address in examining these data, and what implications of their findings to emphasize in their reports. If referral questions lack clarity or psychological relevance, some reformulation is necessary to give direction to the assessment process. For example, a referral in a clinical setting that asks vaguely for personality evaluation or differential diagnosis needs to be specified in consultation with the referring person to identify why a personality evaluation is being sought or what diagnostic possibilities are at issue. Assessment in the absence of a specific referral question can result in a sterile exercise in which neither the data collection process nor the psychologist’s inferences can be focused in a meaningful way.
Even when adequately specified, referral questions are not always psychological in nature. Assessors doing forensic work are frequently asked to evaluate whether criminal defendants were insane at the time of their alleged offense. Sanity is a legal term, however, not a psychological term. There are no assessment methods designed to identify insanity, nor are there any research studies in which being insane has been used as an independent variable. In instances of this kind, in order to help assessors plan their procedures and frame their reports, the referral must be translated into psychological terms, as in defining insanity as the inability to distinguish reality from fantasy.
As a further challenge in formulating assessment goals, specific and psychologically phrased referral questions may still lack clarity as a consequence of addressing complex and multidetermined patterns of behavior. In employment evaluations, for example, a referring person may want to know which of three individuals is likely to perform best in a position of leadership or executive responsibility. To address this type of question effectively, assessors must first be able to identify psychological characteristics that are likely to make a difference in the particular circumstances, as by proceeding, in this example, in the belief that being energetic, decisive, assertive, self-confident, and reasonably unflappable contribute to showing effective and responsible leadership. Then the data collection process can be planned to measure these characteristics, and the eventual report can be focused on using them as a basis for recommending a hiring decision.
The multiple sources of assessment information previously noted include the results of formal psychological testing with standardized instruments; responses to questions asked in structured and unstructured interviews; observations of behavior in various types of contrived situations and natural settings; reports from relatives, friends, employers, and other collateral persons concerning an individual’s previous life history and current characteristics and behavioral tendencies; and documents such as medical records, school records, and written reports of earlier assessments. Individual assessments vary considerably in the availability and utility of these diverse sources of information. Assessments may sometimes be based entirely on record reviews and collateral reports, because the person being assessed is unwilling to be seen directly by an examiner or is for some reason prevented from doing so. Some persons being assessed are quite forthcoming when interviewed but are reluctant to be tested; others find it difficult to talk about themselves but are quite responsive to testing procedures; and in still other cases, in which both interview and test data are ample, there may be a dearth of other information sources on which to draw.
There is little way to know before the fact which sources of information will prove most critical or valuable in an assessment process. What collateral informants say about a person in a particular instance may be more revealing and reliable than what the person says about him- or herself, and in some instances historical documents may prove more informative and dependable than either first-person or collateral reports. Behavioral observations and interview data may sometimes contribute more to an adequate assessment than standardized tests, or may even render testing superfluous; whereas in other instances formal psychological testing may reveal vital diagnostic information that would otherwise not have been uncovered.
The fact that psychological assessment can proceed effectively without psychological testing helps to distinguish between these two activities. The terms psychological assessment and psychological testing are sometimes used synonymously, as noted earlier, but psychological testing is only one among many sources of information that may be utilized in conducting a psychological assessment. Whereas testing refers to the administration of standardized measuring instruments, assessment involves multiple data collection procedures leading to the integration of information from diverse sources. Thus the data collection procedures employed in testing contribute only a portion of the information that is typically utilized in the complex decision-making process that constitutes assessment. This distinction between assessment and testing has previously been elaborated by FernandezBallesteros (1997), Maloney and Ward (1976, chapter 3), and Matarazzo (1990), among others.
Nonetheless, psychological testing stands out among the data collection procedures employed in psychological assessment as the one most highly specialized, diverse, and in need of careful regulation. Psychological testing brings numerous issues to the assessment process, beginning with selection of an appropriate test battery from among an extensive array of available measuring instruments (see Conoley & Impara, 1995, and Fischer & Corcoran, 1994). The chief considerations that should determine the composition of a test battery are the psychometric adequacy of the measures being considered; the relevance of these measures to the referral questions being addressed; the likelihood that these measures will contribute incremental validity to the decision-making process; and the additive, confirmatory, and complementary functions that individual measures are likely to serve when used jointly.
As elaborated by Anastasi and Urbina (1997), in the Standards for Educational and Psychological Testing (AERA, et al., 1999, chapters 1, 2, & 5), the psychometric adequacy of an assessment instrument consists of the extent to which it involves standardized test materials and administration procedures, can be coded with reasonably good interscorer agreement, demonstrates acceptable reliability, has generated relevant normative data, and shows valid corollaries that serve the purposes for which it is intended. Assessment psychologists may at times choose to use tests with uncertain psychometric properties, perhaps for exploratory purposes or for comparison with a previous examination using these tests. Generally speaking, however, formal testing as part of a psychological assessment should be limited to standardized, reliable, and valid instruments for which there are adequate normative data.
The tests selected for inclusion in an assessment battery should provide information relevant to answering the questions that have been raised about the person being examined. Questions that relate to personality functions (e.g., What kind of approach in psychotherapy is likely to be helpful to this person?) call for personality tests. Questions that relate to educational issues (e.g., Does this student have a learning disability?) call for measures of intellectual abilities and academic aptitude and achievement. Questions that relate to neuropsychological functions (e.g., Are there indications of memory loss?) call for measures of cognitive functioning, with special emphasis on measures of capacities for learning and recall.
These examples of relevance may seem too obvious to mention. However, they reflect an important and sometimes overlooked guiding principle that test selection should be justifiable for each measure included in an assessment battery. Insufficient attention to justifying the use of particular measures in specific instances can result in two ill-advised assessment practices: (a) conducting examinations with a fixed and unvarying battery of measures regardless of what questions are being asked in the individual case, and (b) using favorite instruments at every opportunity even when they are unlikely to serve any central or unique purpose in a particular assessment. The administration of minimally useful tests that have little relevance to the referral question is a wasteful procedure that can result in warranted criticism of assessment psychologists and the assessment process. Likewise, the propriety of charging fees for unnecessary procedures can rightfully be challenged by persons receiving or paying for services, and the competence of assessors who give tests that make little contribution to answering the questions at issue can be challenged in such public forums as the courtroom (see Weiner, 2002).
Incremental validity in psychological assessment refers to the extent to which new information increases the accuracy of a classification or prediction above and beyond the accuracy achieved by information already available. Assessors pay adequate attention to incremental validity by collecting the amount and kinds of information they need to answer a referral question, but no more than that. In theory, then, familiarity with the incremental validity of various measures when used for certain purposes, combined with test selection based on this information, minimizes redundancy in psychological assessment and satisfies both professional and scientific requirements for justifiable test selection.
In practice, however, strict adherence to incremental validity guidelines often proves difficult and even disadvantageous to implement. As already noted, it is difficult to anticipate which sources of information will prove to be most useful. Similarly, with respect to which instruments to include in a test battery, there is little way to know whether the tests administered have yielded enough data, and which tests have contributed most to understanding the person being examined, until after the data have been collected and analyzed. In most practice settings, it is reasonable to conduct an interview and review previous records as a basis for deciding whether formal testing would be likely to help answer a referral question— that is, whether it will show enough incremental validity to warrant its cost in time and money. Likewise, reviewing a set of test data can provide a basis for determining what kind of additional testing might be worthwhile. However, it is rarely appropriate to administer only one test at a time, to choose each subsequent test on the basis of the preceding one, and to schedule a further testing session for each additional test administration. For this reason, responsible psychological assessment usually consists of one or two testing sessions comprising a battery of tests selected to serve specific additive, confirmatory, and complementary functions.
Additive, Confirmatory, and Complementary Functions of Tests
Some referral questions require selection of multiple tests to identify relatively distinct and independent aspects of a person’s psychological functioning. For example, students receiving low grades may be referred for an evaluation to help determine whether their poor academic performance is due primarily to limited intelligence or to personality characteristics that are fostering negative attitudes toward achieving in school. A proper test battery in such a case would include some measure of intelligence and some measure of personality functioning. These two measures would then be used in an additive fashion to provide separate pieces of information, both of which would contribute to answering the referral question. As this example illustrates, the additive use of tests serves generally to broaden understanding of the person being examined.
Other assessment situations may create a need for confirmatory evidence in support of conclusions based on test findings, in which case two or more measures of the same psychological function may have a place in the test battery. Assessors conducting a neuropsychological examination to address possible onset of Alzheimer’s disease, for example, ordinarily administer several memory tests. Should each of these tests identify memory impairment consistent with Alzheimer’s, then from a technical standpoint, only one of them would have been necessary and the others have shown no incremental validity. Practically speaking, however, the multiple memory measures taken together provide confirmatory evidence of memory loss. Such confirmatory use of tests strengthens understanding and helps assessors present conclusions with confidence.
The confirmatory function of a multitest battery is especially useful when tests of the same psychological function measure it in different ways. The advantages of multimethod assessment of variables have long been recognized in psychology, beginning with the work of Campbell and Fiske (1959) and continuing with contemporary reports by the American Psychological Association’s (APA’s) Psychological Assessment Work Group, which stress the improved validity that results when phenomena are measured from a variety of perspectives (Kubiszyn et al., 2000; Meyer et al., 2001):
The optimal methodology to enhance the construct validity of nomothetic research consists of combining data from multiple methods and multiple operational definitions . . . Just as effective nomothetic research recognizes how validity is maximized when variables are measured by multiple methods, particularly when the methods produce meaningful discrepancies . . . the quality of idiographic assessment can be enhanced by clinicians who integrate the data from multiple methods of assessment. (Meyer et al., p. 150)
Such confirmatory testing is exemplified in applications of the Minnesota Multiphasic Personality Inventory (MMPI, MMPI-2) and the Rorschach Inkblot Method (RIM), which are the two most widely researched and frequently used personality assessment instruments (Ackerman & Ackerman, 1997; Butcher & Rouse, 1996; Camara, Nathan, & Puente, 2000; Watkins, Campbell, Nieberding, & Hallmark, 1995). As discussed later in this research paper, the MMPI-2 is a relatively structured self-report inventory, whereas the RIM is a relatively unstructured measure of perceptual-cognitive and associational processes (see also Exner, 2003; Graham, 2000; Greene, 2000; Weiner, 1998). Because of differences in their format, the MMPI-2 and the RIM measure normal and abnormal characteristics in different ways and at different levels of a person’s ability and willingness to recognize and report them directly. Should a person display some type of disordered functioning on both the MMPI-2 and the RIM, this confirmatory finding becomes more powerful and convincing than having such information from one of these instruments but not other, even though technically in this instance no incremental validity derives from the second instrument.
Confirmatory evidence of this kind often proves helpful in professional practice, especially in forensic work. As described by Blau (1998), Heilbrun (2001), Shapiro (1991), and others, multiple sources of information pointing in the same direction bolsters courtroom testimony, whereas conclusions based on only one measure of some characteristic can result in assessors’ being criticized for failing to conduct a thorough examination.
Should multiple measures of the same psychological characteristics yield different rather than confirmatory results, these results can usually serve valuable complementary functions in the interpretive process. At times, apparent lack of agreement between two purported measures of the same characteristic has been taken to indicate that one of the measures lacks convergent validity. This negative view of divergent test findings fails to take adequate cognizance of the complexity of the information provided by multimethod assessment and can result in misleading conclusions. To continue with the example of conjoint MMPI-2 and RIM testing, suppose that a person’s responses show elevation on indices of depression on one of these measures but not the other. Inasmuch as indices on both measures have demonstrated some validity in detecting features of depression, the key question to ask is not which measure is wrong in this instance, but rather why the measures have diverged.
Perhaps, as one possible explanation, the respondent has some underlying depressive concerns that he or she does not recognize or prefers not to admit to others, in which case depressive features might be less likely to emerge in response to the self-report MMPI-2 methodology than on the more indirect Rorschach task. Or perhaps the respondent is not particularly depressed but wants very much to give the impression of being in distress and needing help, in which case the MMPI-2 might be more likely to show depression than the RIM. Or perhaps the person generally feels more relaxed and inclined to be forthcoming in relatively structured than relatively unstructured situations, and then the MMPI-2 is more likely than the RIM to reveal whether the person is depressed.
As these examples show, multiple measures of the same psychological characteristic can complement each other when they diverge, with one measure sometimes picking up the presence of a characteristic (a true positive) that is missed by the other (a false negative). Possible reasons for the false negative can contribute valuable information about the respondent’s test-taking attitudes and likelihood of behaving differently in situations that differ in the amount of structure they provide. The translation of such divergence between MMPI-2 and RIM findings into clinically useful diagnostic inferences and individual treatment planning is elaborated by Finn (1996) and Ganellen (1996). Whatever measures may be involved in weighing the implications of divergent findings, this complementary use of test findings frequently serves to deepen understanding gleaned from the assessment process.
The amount and kind of data collected in psychological assessments depend in part on two issues concerning the examiners who conduct these assessments. The first issue involves the qualifications and competence of examiners to utilize the procedures they employ, and the second has to do with ways in which examiners’personal qualities can influence how different kinds of people respond to them.
Qualifications and Competence
There is general consensus that persons who conduct psychological assessments should be qualified by education and training to do so. The Ethical Principles and Code of Conduct promulgated by the APA (1992) offers the following general guideline in this regard: “Psychologists provide services, teach, and conduct research only within the boundaries of their competence, based on their education, training, supervised experience, or appropriate professional experience” (Ethical Code 1.04[a]). Particular kinds of knowledge and skill that are necessary for test users to conduct adequate assessments are specified further in the Test User Qualifications endorsed by the APA (2001). Finally of note with respect to using tests in psychological assessments, the Standards for Educational and Psychological Testing (AERA et al., 1999) identify who is responsible for the proper use of tests: “The ultimate responsibility for appropriate test use and interpretation lies predominantly with the test user. In assuming this responsibility, the user must become knowledgeable about a test’s appropriate uses and the populations for which it is suitable” (p. 112).
Despite the clarity of these statements and the considerable detail provided in the Test User Qualifications, two persistent issues in contemporary assessment practice remain unresolved. First, adequate psychological testing qualifications are typically inferred for any examiners holding a graduate degree in psychology, being licensed in their state, and presenting themselves as competent to practice psychological assessment. Until such time as the criteria proposed in the Test User Qualifications become incorporated into formal accreditation procedures, qualification as an assessor will continue to be conferred automatically on psychologists obtaining licensure. Unfortunately, being qualified by license to use psychological tests does not ensure being competent in using them. Being competent in psychological testing requires familiarity with the latest revision of whatever instruments an assessor is using, with current research and the most recent normative data concerning these instruments, and with the manifold interpretive complexities they are likely to involve. Assessment competence also requires appreciation for a variety of psychometric, interpersonal, sociocultural, and contextual issues that affect not only the collection but also the interpretation and utilization of assessment information (see Sandoval, Frisby, Geisinger, & Scheuneman, 1990). The steady output of new or revised measures, research findings, and practice guidelines make assessment psychology a dynamic and rapidly evolving field with a large and burgeoning literature. Only by keeping reasonably current with these developments can psychological assessors become and remain competent, and only by remaining competent can they fulfill their ethical responsibilities (Kitchener, 2000, chapter 9; Koocher & Keith-Spiegel, 1998; Weiner, 1989).
The second persistent issue concerns assessment by persons who are not psychologists and are therefore not bound by this profession’s ethical principles or guidelines for practice. Nonpsychologist assessors who can obtain psychological tests are free to use them however they wish. When easily administered measures yield test scores that seem transparently interpretable, as in the case of an elevated Borderline scale on the Millon Multiaxial Clinical Inventory–III (MCMI-III; Choca, Shanley, & Van Denberg, 1997) or an elevatedAcquiescence scale on the HollandVocational Preference Inventory (VPI; Holland, 1985), unqualified examiners can draw superficial conclusions that take inadequate account of the complexity of these instruments, the interactions among their scales, and the limits of their applicability. It accordingly behooves assessment psychologists not only to maintain their own competence, but also to call attention in appropriate circumstances to assessment practices that fall short of reasonable standards of competence.
Assessors can influence the information they collect by virtue of their personal qualities and by the manner in which they conduct a psychological examination. In the case of selfadministered measures such as interest surveys or personality questionnaires, examiner influence may be minimal. Interviews and interactive testing procedures, on the other hand, create ample opportunity for an examiner’s age, gender, ethnicity, or other characteristics to make respondents feel more or less comfortable and more or less inclined to be forthcoming. Examiners accordingly need to be alert to instances in which such personal qualities may be influencing the nature and amount of the data they are collecting.
The most important personal influence that examiners cannot modify or conceal is their language facility. Psychological assessment procedures are extensively language-based, either in their content or in the instructions that introduce nonverbal tasks, and accurate communication is therefore essential for obtaining reliable assessment information. It is widely agreed that both examiners and whomever they are interviewing or testing should be communicating either in their native language or in a second language in which they are highly proficient (AERAet al., 1999, chapter 9).The use of interpreters to circumvent language barriers in the assessment process rarely provides a satisfactory solution to this problem. Unless an interpreter is fully conversant with idiomatic expressions and cultural referents in both languages, is familiar with standard procedures in psychological assessment, and is a stranger to the examinee (as opposed to a friend, relative, or member of the same closely knit subcultural community), the obtained results may be of questionable validity. Similarly, in the case of self-administered measures, instructions and test items must be written in a language that the respondent can be expected to understand fully. Translations of pencil-and-paper measures accordingly require close attention to the idiomatic vagaries of each new language and to culture-specific contents of individual test items, in order to ensure equivalence of measures in the cross-cultural applications of tests (Allen & Walsh, 2000; Dana, 2000a).
Unlike their fixed qualities, the manner in which examiners conduct the assessment process is within their control, and untoward examiner influence can be minimized by appropriate efforts to promote full and open response to the assessment procedures. To achieve this end, an assessment typically begins with a review of its purposes, a description of the procedures that will be followed, and efforts to establish a rapport that will help the person being evaluated feel comfortable and willing to cooperate with the assessment process.Variations in examiner behavior while introducing and conducting psychological evaluations can substantially influence how respondents perceive the assessment situation—forexample, whether they see it as an authoritarian investigative process intended to ferret out defects and weaknesses, or as a mutually respectful and supportive interaction intended to provide understanding and help. Even while following closely the guidelines for a structured interview and adhering faithfully to standardized procedures for administering various tests, the examiner needs to recognize that his or her manner, tone of voice, and apparent attitude are likely to affect the perceptions and comfort level of the person being assessed and, consequently, the amount and kind of information that person provides (see Anastasi & Urbina, 1977; Masling, 1966, 1998).
Examiner influence in the assessment process inevitably interacts with the attitudes and inclinations of the person being examined. Some respondents may feel more comfortable being examined by an older person than a younger one, for example, or by a male than a female examiner, whereas other respondents may prefer a younger and female examiner. Among members of a minority group, some may prefer to be examined by a person with a cultural or ethnic background similar to theirs, whereas others are less concerned with the examiner’s background than with his or her competence. Similarly, with respect to examiner style, a passive, timid, and dependent person might feel comforted by a warm, friendly, and supportive examiner approach that would make an aloof, distant, and mistrustful person feel uneasy; conversely, an interpersonally cautious and detached respondent might feel safe and secure when being examined in an impersonal and businesslike manner that would be unsettling and anxiety provoking to an interpersonally needy and dependent respondent. With such possibilities in mind, skilled examiners usually vary their behavioral style with an eye to conducting assessments in ways that will be likely to maximize each individual respondent’s level of comfort and cooperation.
Two other respondent issues that influence the data collection process concern a person’s right to give informed consent to being evaluated and his or her specific attitudes toward being examined. With respect to informed consent, the introductory phase of conducting an assessment must ordinarily include not only the explanation of purposes and procedures mentioned previously, which informs the respondent, but also an explicit agreement by the respondent or persons legally responsible for the respondent to undergo the evaluation.As elaborated in the Standards for Educational and Psychological Testing (AERAet al., 1999), informed consent can be waived only when an assessment has been mandated by law (as in a court-ordered evaluation) or when it is implicit, as when a person applies for a position or opportunity for which being assessed is a requirement (i.e., a job for which all applicants are being screened psychologically; see also Kitchener, 2000). Having given their consent to be evaluated, moreover, respondents are entitled to revoke it at any time during the assessment process. Hence, the prospects for obtaining adequate assessment data depend not only on whether respondents can be helped to feel comfortable and be forthcoming, but even more basically on whether they consent in the first place to being evaluated and remain willing during the course of the evaluation.
Issues involving a respondent’s specific attitudes toward being examined typically arise in relation to whether the assessment is being conducted for clinical or for administrative purposes. When assessments are being conducted for clinical purposes, the examiner is responsible to the person being examined, the person being examined is seeking some type of assistance, and the examination is intended to be helpful to this person and responsive to his or her needs. As common examples in clinical assessments, people concerned about their psychological well-being may seek an evaluation to learn whether they need professional mental health care, and people uncertain about their educational or vocational plans may want look for help in determining what their abilities and interests suit them to do. In administrative assessments, by contrast, examiners are responsible not to the person being examined, but to some third party who has requested the evaluation to assist in arriving at some judgment about the person. Examiners in an administrative assessment are ethically responsible for treating the respondent fairly and with respect, but the evaluation is being conducted for the benefit of the party requesting it, and the results may or may not meet the respondent’s needs or serve his or her best interests. Assessment for administrative purposes occurs commonly in forensic, educational, and organizational settings when evaluations are requested to help decide such matters as whether a prison inmate should be paroled, a student should be admitted to a special program, or a job applicant should be hired (see Monahan, 1980).
As for their attitudes, respondents being evaluated for clinical purposes are relatively likely to be motivated to reveal themselves honestly, whereas those being examined for administrative purposes are relatively likely to be intent on making a certain kind of impression. Respondents attempting to manage the impression they give are likely to show themselves not as they are, but as they think the person requesting the evaluation would view favorably. Typically such efforts at impression management take the form of denying one’s limitations, minimizing one’s shortcomings, attempting to put one’s very best foot forward, and concealing whatever might be seen in a negative light. Exceptions to this general trend are not uncommon, however. Whereas most persons being evaluated for administrative purposes want to make the best possible impression, some may be motivated in just the opposite direction. For example, a plaintiff claiming brain damage in a personal injury lawsuit may see benefit in making the worst possible impression on a neuropsychological examination. Some persons being seen for clinical evaluations, despite having come of their own accord and recognizing that the assessment is being conducted for their benefit, may nevertheless be too anxious or embarrassed to reveal their difficulties fully. Whatever kind of impression respondents may want to make, the attitudes toward being examined that they bring with them to the assessment situation can be expected to influence the amount and kind of data they produce. These attitudes also have a bearing on the interpretation of assessment data, and the further implications of impression management for malingering and defensiveness are discussed later in the paper.
Data Management Issues
A final set of considerations in collecting assessment information concerns appropriate ways of managing the data that are obtained. Examiners must be aware in particular of issues concerning the use of computers in data collection; the responsibility they have for safeguarding the security of their measures; and their obligation, within limits, to maintain the confidentiality of what respondents report or reveal to them.
Computerized Data Collection
Software programs are available to facilitate the data collection process for most widely used assessment methods. Programs designed for use with self-report questionnaires typically provide for online administration of test items, automated coding of item responses to produce scale scores, and quantitative manipulation of these scale scores to yield summary scores and indices. For instruments that require examiner administration and coding (e.g., a Wechsler intelligence test), software programs accept test scores entered by the examiner and translate them into the test’s quantitative indices (e.g., the Wechsler IQ and Index scores). Many of these programs store the test results in files that can later be accessed or exported, and some even provide computational packages that can generate descriptive statistics for sets of test records held in storage.
These features of computerized data management bring several benefits to the process of collecting assessment information. Online administration and coding of responses help respondents avoid mechanical errors in filling out test forms manually, and they eliminate errors that examiners sometimes make in scoring these responses (see Allard & Faust, 2000). For measures that require examiner coding and data entry, the utility of the results depends on accurate coding and entry, but once the data are entered, software programs eliminate examiner error in calculating summary scores and indices from them. The data storage features of many software programs facilitate assessment research, particularly for investigators seeking to combine databases from different sources, and they can also help examiners meet requirements in most states and many agencies for keeping assessment information on file for some period of time. For such reasons, the vast majority of assessment psychologists report that they use software for test scoring and feel comfortable doing so (McMinn, Ellens, & Soref, 1999).
Computerized collection of assessment information has some potential disadvantages as well, however.When assessment measures are administered online, first of all, the reliability of the data collected can be compromised by a lack of equivalence between an automated testing procedure and the noncomputerized version on which it is based.As elaborated by Butcher, Perry, and Atlis (2000), Honaker and Fowler (1990), and Snyder (2000), the extent of such equivalence is currently an unresolved issue.Available data suggest fairly good reliability for computerized administrations based on pencil-and-paper questionnaires, especially those used in personality assessment. With respect to the MMPI, for example, a meta-analysis by Finger and Ones (1999) of all available research comparing computerized with booklet forms of the instrument has shown them to be psychometrically equivalent. On the other hand, good congruence with the original measures has yet to be demonstrated for computerized versions of structured clinical interviews and for many measures of visual-spatial functioning used in neuropsychological assessment. Among software programs available for test administration, moreover, very few have been systematically evaluated with respect to whether they obtain exactly the same information as would emerge in a standard administration of the measure on which they are based.
Asecond potential disadvantage of computerized data collection derives from the ease with which it can be employed. Although frequently helpful to knowledgeable assessment professionals and thus to the persons they examine, automated procedures also simplify psychological testing for untrained and unqualified persons who lack assessment skills and would not be able to collect test data without the aid of a computer. The availability of software programs thus creates some potential for assessment methods to be misused and respondents to be poorly served. Such outcomes are not an inescapable by-product of computerized assessment procedures, however. They constitute instead an abuse of technology by uninformed and irresponsible persons.
Test security refers to restricting the public availability of test materials and answers to test items. Such restrictions address two important considerations in psychological assessment. First, publicly circulated information about tests can undermine their validity, particularly in the case of measures comprising items with right and wrong or more or less preferable answers. Prior exposure to tests of this kind and information about correct or preferred answers can affect how persons respond to them and prevent an examiner from being able to collect a valid protocol. The validity of test findings is especially questionable when a respondent’s prior exposure has included specific coaching in how to answer certain questions. As for relatively unstructured assessment procedures that have no right or wrong answers, even on these measures various kinds of responses carry particular kinds of interpretive significance. Hence, the possibility exists on relatively unstructured measures as well that persons intent on making a certain kind of impression can be helped to do so by pretest instruction concerning what various types of responses are taken to signify. However, the extent to which public dissemination of information about the inferred meaning of responses does in fact compromise the validity of relatively unstructured measures has not yet been examined empirically and is a subject for further research.
Second, along with helping to preserve the validity of obtained results, keeping assessment measures secure protects test publishers against infringement of their rights by pirated or plagiarized copies of their products. Ethical assessors respect copyright law by not making or distributing copies of published tests, and they take appropriate steps to prevent test forms, test manuals, and assessment software from falling into the hands of persons who are not qualified to use them properly or who feel under no obligation to keep them secure. Both the Ethical Principles and Code of Conduct (APA, 1992, Section 2.10) and the Standards for Educational and Psychological Testing (AERA et al., 1999, p. 117) address this professional responsibility in clear terms.
These considerations in safeguarding test security also have implications for the context in which psychological assessment data are collected. Assessment data have become increasingly likely in recent years to be applied in forensic settings, and litigious concerns sometimes result in requests to have a psychological examination videotaped or observed by a third party. These intrusions on traditional examination procedures pose a threat to the validity of the obtained data in two respects. First, there is no way to judge or measure the impact of the videotaping or the observer on what the respondent chooses to say and do. Second, the normative standards that guide test interpretation are derived from data obtained in two-person examinations, and there are no comparison data available for examinations conducted in the presence of a camera or an observer. Validity aside, exposure of test items to an observer or through a videotape poses the same threat to test security as distributing test forms or manuals to persons who are under no obligation to keep them confidential. Psychological assessors may at times decide for their own protection to audiotape or videotape assessments when they anticipate legal challenges to the adequacy of their procedures or the accuracy of their reports. They may also use recordings on occasion as an alternative to writing a long and complex test protocol verbatim. For purposes of test security, however, recordings made for other people to hear or see, like third-party observers, should be avoided.
A third and related aspect of appropriate data management pertains to maintaining the confidentiality of a respondent’s assessment information. Like certain aspects of safeguarding test security, confidentiality is an ethical matter in assessment psychology, not a substantive one. The key considerations in maintaining the confidentiality of assessment information, as specified in the Ethical Principles and Code of Conduct (APA, 1992, Section 5) and elaborated by Kitchener (2000, chapter 6) involve (a) clarifying the nature and limits of confidentiality with clients and patients prior to undertaking an evaluation; (b) communicating information about persons being evaluated only for appropriate scientific or professional purposes and only to an extent relevant to the purposes for which the evaluation was conducted; (c) disclosing information only to persons designated by respondents or other duly authorized persons or entities, except when otherwise permitted or required by law; and (d) storing and preserving respondents’ records in a secure fashion.
Interpreting Assessment Information
Following the collection of sufficient relevant data, the process of psychological assessment continues with a phase of evaluation in which these data are interpreted. The interpretation of assessment data consists of drawing inferences and forming impressions concerning what the findings reveal about a respondent’s psychological characteristics. Accurate and adequately focused interpretations result in summary descriptions of psychological functioning that can then be utilized in the final phase of the assessment process as a foundation for formulating conclusions and recommendations that answer referral questions.Reaching this output phase requires consideration during the evaluation phase of the basis on which inferences are drawn and impressions formed, the possible effects on the findings of malingering or defensiveness, and effective ways of integrating data from diverse sources.
Basis of Inferences and Impressions
The interpretation of assessment data involves four sets of alternatives with respect to how assessors go about drawing inferences and forming impressions about what these data indicate. Interpretations can be based on either empirical or conceptual approaches to decision making; they can be guided either by statistically based decision rules or by clinical judgment; they can emphasize either nomothetic or idiographic characteristics of respondents; and they can include more or less reliance on computer-generated interpretive statements. Effective assessment usually involves informed selection among these alternatives and some tailoring of the emphasis given each of them to fit the particular context of the individual assessment situation.
Empirical and Conceptual Guidelines
The interpretation of assessment information can be approached in several ways. In what may be called an intuitive approach, assessment decisions stem from impressions that have no identifiable basis in the data. Instead, interpretations are justified by statements like “It’s just a feeling I have about her,” or “I can’t say where I get it from, but I just know he’s that way.” In what may be called an authoritative approach, interpretations are based on the pronouncements of wellknown or respected assessment psychologists, as in saying, “These data mean what they mean because that’s what Dr. Expert says they mean.” The intuition of unusually empathic assessors and reliance on authority by well-read practitioners who choose their experts advisedly may on occasion yield accurate and useful impressions. Both approaches have serious shortcomings, however. Unless intuitive assessors can identify specific features of the data that help them reach their conclusions, their diagnostic sensitivity cannot be taught to other professionals or translated into scientifically verifiable procedures. Unless authoritative assessors can explain in their own words the basis on which experts have reached the conclusions being cited, they are unlikely to impress others as being professionally knowledgeable themselves or as knowing what to think in the absence of being told by someone else what to think.
Moreover, neither intuitive nor authoritative approaches to interpreting assessment information are likely to be as consistently reliable as approaches based on empirical and conceptual guidelines. Empirical guidelines to decision making derive from the replicated results of methodologically sound research. When a specific assessment finding has repeatedly been found to correlate highly with the presence of a particular psychological characteristic, it is empirically sound to infer the presence of that characteristic in a respondent who displays that assessment finding. Conceptual guidelines to decision making consist of psychological constructs that provide a logical bridge between assessment findings and the inferences drawn from them. If subjectively felt distress contributes to a person’s remaining in and benefiting from psychotherapy (for which there is considerable evidence; see Garfield, 1994; Greencavage & Norcross, 1990; Mohr, 1995), and if a test includes a valid index of subjectively felt distress (which many tests do), then it is reasonable to expect that a positive finding on this test index will increase the predicted likelihood of a favorable outcome in psychotherapy.
Both empirical and conceptual guidelines to interpretation bring distinct benefits to the assessment process. Empirical perspectives are valuable because they provide a foundation for achieving certainty in decision making. The adequacy of psychological assessment is enhanced by quantitative data concerning the normative distribution and other psychometric properties of measurements that reflect dimensions of psychological functioning. Lack of such data limits the confidence with which assessors can draw conclusions about the implications of their findings. Without being able to compare an individual’s test responses with normative expectations, for example, or without a basis for estimating false positive and false negative possibilities in the measures they have used, assessors can only be speculative in attaching interpretive significance to their findings. Similarly, the absence of externally validated cutting scores detracts considerably from the certainty with which assessors can translate test scores into qualitative distinctions, such as whether a person is mildly, moderately, or severely depressed.
Conceptual perspectives are valuable in the assessment process because they provide some explanation of why certain findings are likely to identify certain kinds of psychological characteristics or predict certain kinds of behavior. Having such explanations in hand offers assessors the pleasure of understanding not only how their measures work but also why they work as they do; they help assessors focus their attention on aspects of their data that are relevant to the referral question to which they are responding; and they facilitate the communication of results in terms that address characteristics of the person being examined and not merely those of the data obtained. As a further benefit of conceptual formulations of assessment findings, they foster hypotheses concerning previously unknown or unexplored linkages between assessment findings and dimensions of psychological functioning and thereby help to extend the frontiers of knowledge.
Empirical guidelines are thus necessary to the scientific foundations of assessment psychology, as a basis for certainty in decision making, but they are not sufficient to bring this assessment to its full potential. Conceptual guidelines do not by themselves provide a reliable basis for drawing conclusions with certainty. However, by enriching the assessment process with explanatory hypotheses, they point the way to advances in knowledge.
For the purposes that each serves, then, both empirical and conceptual guidelines have an important place in the interpretation of assessment information. At times, concerns about preserving the scientific respectability of assessment have led to assertions that only empirical guidelines constitute an acceptable basis for decision making and that unvalidated conceptual guidelines have no place in scientific psychology. McFall and Treat (1999), for example, maintain that “the aim of clinical assessment is to gather data that allow us to reduce uncertainty concerning the probability of events” (p. 215). From their perspective, the information value of assessment data resides in scaled numerical values and conditional probabilities.
As an alternative point of view, let it be observed that the river of scientific discovery can flow through inferential leaps of deductive reasoning that suggest truths long before they are confirmed by replicated research findings. Newton grasped the reason that apples fall from trees well in advance of experiments demonstrating the laws of gravity, Einstein conceived his theory of relativity with full confidence that empirical findings would eventually prove him correct, and neither has suffered any challenges to his credentials as a scientist. Even though empirical guidelines are, on the average, more likely to produce reliable conclusions than are conceptual formulations, as already noted, logical reasoning concerning the implications of clearly formulated concepts can also generate conclusions that serve useful purposes and stand the test of time.
Accordingly, the process of arriving at conclusions in individual case assessment can involve creative as well as confirmatory aspects of scientific thinking, and the utilization of assessment to generate hypotheses and fuel speculation may in the course of scientific endeavor increase rather than decrease uncertainty in the process of identifying new alternative possibilities to pursue. This perspective is echoed by DeBruyn (1992) in the following comment: “Both scientific decision making in general, and diagnostic decision making in particular, have a repetitive side, which consists of formulas and algorithmic procedures, and a constructive side, which consists of generating hypotheses and theories to explain things or to account for unexpected findings” (p. 192).
Statistical Rules and Clinical Judgment
Empirical guidelines for decision making have customarily been operationalized by using statistical rules to arrive at conclusions concerning what assessment data signify. Statistical rules for interpreting assessment data comprise empirically derived formulas, or algorithms, that provide an objective, actuarial basis for deciding what these data indicate. When statistical rules are applied to the results of a psychological evaluation, the formula makes the decision concerning whether certain psychological characteristics are present (as in deciding whether a respondent has a particular trait or disorder) or whether certain kinds of actions are likely to ensue (as in predicting the likelihood of a respondent’s behaving violently or performing well in some job). Statistical rules have the advantage of ensuring that examiners applying a formula correctly to the same set of data will always arrive at the same conclusion concerning what these data mean. As a disadvantage, however, the breadth of the conclusions that can be based on statistical rules and their relevance to referral questions are limited by the composition of the database from which they have been derived.
For example, statistical rules may prove helpful in determining whether a student has a learning disability, but say nothing about the nature of this student’s disability; they may predict the likelihood of a criminal defendant’s behaving violently, but offer no clues to the kinds of situations that are most likely to evoke violence in this particular criminal defendant; or they may help identify the suitability of a person for one type of position in an organization, but be mute with respect to the person’s suitability for other types of positions in the same organization. In each of these instances, moreover, a statistical rule derived from a group of people possessing certain demographic characteristics (e.g., age, gender, socioeconomic status, cultural background) and having been evaluated in a particular setting may lack validity generalization to persons with different demographic characteristics evaluated in some other kind of setting. Garb (2000) has similarly noted in this regardthat“statistical-predictionrulesareoflimitedvaluebecause they have typically been based on limited information that has not been demonstrated to be optimal and they have almost never been shown to be powerful” (p. 31).
In other words, then, the scope of statistical rules is restricted to findings pertaining to the particular kinds of persons, psychological characteristics, and circumstances that were anticipated in building them. For many of the varied types of people seen in actual assessment practice, and for many of the complex and specifically focused referral questions raised about these people, then, statistical rules that by themselves provide fully adequate answers may be in short supply.
As a further limitation of statistical rules, they share with all quantified assessment scales some unavoidable artificiality that accompanies translating numerical scores into qualitative descriptive categories. On the Beck Depression Inventory (BDI; Beck, Steer, & Garbin, 1988), for example, a score of 14 to 19 is taken to indicate mild depression and a score of 20 to 28 indicates moderate depression. Hence two people who have almost identical BDI scores, one with a 19 and the other with a 20, will be described much differently by the statistical rule, one as mildly depressed and the other as moderately depressed. Likewise, in measuring intelligence with the Wechsler Adult Intelligence Scale–III (WAIS-III; Kaufman, 1990) a Full Scale IQ score of 109 calls for describing a person’s intelligence as average, whereas a person with almost exactly the same level of intelligence and a Full Scale IQ of 110 falls in the high average range.According to the WAIS-III formulas, a person with a Full Scale IQ of 91 and a person with a Full Scale IQ of 119 would also be labeled, respectively, as average and high average. Some assessors minimize this problem by adding some further specificity to the WAIS-III categories, as in labeling a 109 IQ as the high end of the average range and a 110 IQ as the low end of the high average range. Although additional categorical descriptions for more narrowly defined score ranges can reduce the artificiality in the use of statistical rules, there are limits to how many quantitative data points on a scale can be assigned a distinctive qualitative designation.
Conceptual guidelines for decision making have been operationalized in terms of clinical judgment, which consists of the cumulative wisdom that practitioners acquire from their experience. Clinical guidelines may come to represent the shared beliefs of large numbers of practitioners, but they emerge initially as impressions formed by individual practitioners. In contrast to the objective and quantitative features of statistical rules, clinical judgments constitute a subjective and qualitative basis for arriving at conclusions. When clinical judgment is applied to assessment data, decisions are made by the practitioner, not by a formula. Clinical judgments concerning the interpretive significance of a set of assessment data are consequently less uniform than actuarial decisions and less likely to be based on established fact. On the other hand, the applicability of clinical judgments is infinite, and their breadth and relevance are limited not by any database, but only by the practitioner’s capacity to reason logically concerning possible relationships between psychological characteristics identified by the assessment data and psychological characteristics relevant to addressing referral questions, whatever their complexity and specificity.
The relative merit of statistical rules and clinical judgment in the assessment process has been the subject of considerable debate since this distinction was first formulated by Meehl (1954) in his book Clinical Versus Statistical Prediction. Subsequent publications of note concerning this important issue include articles by Grove and Meehl (1996), Grove, Zald, Lebow, Snitz, and Nelson (2000), Holt (1958, 1986), Karon (2000), Meehl (1986), and Swets, Dawes, and Monahan (2000), and a book by Garb (1998) entitled Studying the Clinician. Much of the literature on this topic has consisted of assertions and rebuttals concerning whether statistical methods generally produce more accurate assessment results than clinical methods. In light of the strengths and weaknesses inherent in both statistical prediction and clinical judgment such debate serves little purpose and is regrettable when it leads to disparagement of either approach to interpreting assessment data.
As testimony to the utility of both approaches, it is important to note that the creation of good statistical rules for making assessment decisions typically begins with clinically informed selection of both (a) test items, structured interview questions, and other measure components to be used as predictor variables, and (b) psychological conditions, behavioral tendencies, and other criterion variables to which the predictor variables are expected to relate. Empirical methods of scale construction and cross-validation are then employed to shape these clinically relevant assessment variables into valid actuarial measures of these clinically relevant criterion variables. Hence good statistical rules should almost always produce more accurate results than clinical judgment, because they encompass clinical wisdom plus the sharpening of this wisdom by replicated research findings. Clinical methods of assessment at their best depend on the impressions and judgment of individual practitioners, whereas statistical methods at their best constitute established fact that has been built on clinical wisdom. To rely only on clinical judgment in decision-making situations for which adequate actuarial guidelines are available is tantamount to playing cards with half a deck. Even the best judgment of the best practitioner can at times be clouded by inadvertent bias, insufficient awareness of base rates, and other sources of influence discussed in the final section of this research paper. When one is given a reasonable choice, then, assessment decisions are more advisedly based on established fact rather than clinical judgment.
On the other hand, the previously noted diversity of people and of the circumstances that lead to their being referred for an evaluation mean that assessment questions regularly arise for which there are no available statistical rules, and patterns of assessment data often resemble but do not quite match the parameters for which replicated research has demonstrated certain correlates. When statistical rules cannot fully answer questions being asked, what are assessors to do in the absence of fully validating data? Decisions could be deferred, on the grounds that sufficient factual basis for a decision is lacking, and recommendation could be delayed, pending greater certainty about what recommendation to make. Alternatively, assessors in a situation of uncertainty can supplement whatever empirical guidelines they do have at their disposal with logical reasoning and cumulative clinical wisdom to arrive at conclusions and recommendations that are more responsive and at least a little more likely to be helpful than saying nothing at all.
As these observations indicate, statistical rules and clinical judgment can properly be regarded as complementary components of effective decision making, rather than as competing and mutually exclusive alternatives. Each brings value to assessment psychology and has a respectable place in it. Geisinger and Carlson (2002) comment in this regard that the time has come “to move beyond both purely judgmental, speculative interpretation of test results as well as extrapolations from the general population to specific cases that do not much resemble the remainder of the population” (p. 254).
Assessment practice should accordingly be subjected to and influenced by research studies, lest it lead down blind alleys and detract from the pursuit of knowledge and the delivery of responsible professional service. Concurrently, however, lack of unequivocal documentation should not deter assessment psychologists from employing procedures and reaching conclusions that in their judgment will assist in meeting the needs of those who seek their help. Commenting on balanced use of objective and subjective contributions to assessment decision making, Swets et al. (2000) similarly note that “the appropriate role of the SPR [Statistical Prediction Rule] vis-à-vis the diagnostician will vary from one context to another” and that the most appropriate roles of each “can be determined for each diagnostic setting in accordance with the accumulated evidence about what works best” (p. 5). Putting the matter in even simpler terms, Kleinmuntz (1990) observed that “the reason why we still use our heads, flawed as they may be, instead of formulas is that for many decisions, choices and problems, there are as yet no available formulas” (p. 303).
Nomothetic and Idiographic Emphasis
Empirical guidelines and statistical rules constitute a basically nomothetic approach to interpreting assessment information, whereas conceptual guidelines and clinical judgment underlie a basically idiographic approach. Nomothetic interpretations address ways in which people resemble other kinds of people and share various psychological characteristics with many of them. Hence, these interpretations involve comparisons between the assessment findings for the person being examined and assessment findings typically obtained from groups of people with certain known characteristics, as in concluding that “this person’s responses show a pattern often seen in people who feel uncomfortable in social situations and are inclined to withdraw from them.” The manner in which nomothetic interpretations are derived and expressed is thus primarily quantitative in nature and may even specify the precise frequency with which an assessment finding occurs in particular groups of people.
Idiographic interpretations, by contrast, address ways in which people differ from most other kinds of people and show psychological characteristics that are fairly unique to them and their particular circumstances. These interpretations typically comprise statements that attribute person-specific meaning to assessment information on the basis of general notions of psychological processes, as in saying that “this person gives many indications of being a passive and dependent individual who is more comfortable being a follower than a leader and will as a consequence probably have difficulty functioning effectively in an executive position.” Deriving and expressing idiographic interpretations is thus a largely qualitative procedure in which examiners are guided by informed impressions rather than by quantitative empirical comparisons.
In the area of personality assessment, both nomothetic and idiographic approaches to interpretation have a long and distinguished tradition. Nomothetic perspectives derive from the work of Cattell (1946), for whom the essence of personality resided in traits or dimensions of functioning that all people share to some degree and on which they can be compared with each other. Idiographic perspectives in personality theory were first clearly articulated by Allport (1937), who conceived the essence of personality as residing in the uniqueness and individuality of each person, independently of comparisons to other people. Over the years, assessment psychologists have at times expressed different convictions concerning which of these two traditions should be emphasized in formulating interpretations. Practitioners typically concur with Groth-Marnat (1997) that data-oriented descriptions of people rarely address the unique problems a person may be having and that the essence of psychological assessment is an attempt “to evaluate an individual in a problem situation so that the information derived from the assessment can somehow help with the problem” (p. 32). Writing from a research perspective, however, McFall and Townsend (1998) grant that practitioners must of necessity provide idiographic solutions to people’s problems, but maintain that “nomothetic knowledge is a prerequisite to valid idiographic solutions” (p. 325). In their opinion, only nomothetic variables have a proper place in the clinical science of assessment.
To temper these points of view in light of what has already been said about statistical and clinical prediction, there is no reason that clinicians seeking solutions to idiographic problem cannot or should not draw on whatever nomothetic guidelines may help them frame accurate and useful interpretations. Likewise, there is no reason that idiography cannot be managed in a scientific fashion, nor is a nomotheticidiographic distinction between clinical science and clinical practice likely to prove constructive in the long run. Stricker (1997) argues to the contrary, for example, that science incorporates an attitude and a set of values that can characterize office practitioners as well as laboratory researchers, and that “the same theoretical matrix must generate both science and practice activities” (p. 442).
Issues of definition aside, then, there seems little to be gained by debating whether people can be described better in terms of how they differ from other people or how they resemble them. In practice, an optimally informative and useful description of an individual’s psychological characteristics and functioning will encompass the person’s resemblance to and differences from other people in similar circumstances about whom similar referral questions have been posed. Nomothetic and idiographic perspectives thus complement each other, and a balanced emphasis on both promotes the fullest possible understanding of a person being examined.
Computer-Generated Interpretive Statements
Most published tests include software programs that not only assist in the collection of assessment data, as already discussed, but also generate interpretive statements describing the test findings and presenting inferences based on them. Like computerized data collection, computer-based test interpretation (CBTI) brings some distinct advantages to the assessment process. By virtue of its automation, CBTI guarantees a thorough scan of the test data and thereby eliminates human error that results from overlooking items of information in a test protocol. CBTI similarly ensures that a pattern of test data will always generate the same interpretive statement, uniformly and reliably, thus eliminating examiner variability and bias as potential sources of error. CBTI can also facilitate the teaching and learning of assessment methods, by using computergenerated narratives as an exercise requiring the learner to identify the test variableslikely to have given rise to particular statements. The potential benefits of computerizing test interpretations, as well as some drawbacks of doing so, are elaborated by Butcher (2002). Four limitations of CBTI have a particular bearing on the extent to which examiners should rely on computer-generated statements in formulating and expressing their impressions.
First, although test software generates interpretive statements by means of quantitative algorithmic formulas, these computer programs are not entirely empirically based. Instead, they typically combine empirically validated correlates of test scores with clinical judgments about what various patterns of scores are likely to signify, and many algorithms involve beliefs as well as established fact concerning what these patterns mean. Different test programs, and even different programs for the same test, vary in the extent to which their interpretive statements are research based. Although CBTI generally increases the validity and utility of test interpretations, then, considerable research remains to be done to place computerized interpretation on a solid empirical basis (see Garb, 2000). In the meantime, computer-generated interpretations will embody at least some of the strengths and weaknesses of both statistical and clinical methods of decision making.
Second, the previously noted limitation of statistical rules with respect to designating quantitative score ranges with qualitative descriptors carries over into CBTI algorithms. Cutting points must be established, below which one kind or degree of descriptive statement is keyed and above which a different kind or degree of description will be generated. As a consequence, two people who show very similar scores on some index or scale may be described by a computer narrative in very different terms with respect to psychological characteristics measured by this index or scale.
Third, despite often referring specifically to the person who took the test (i.e., using the terms he, she, or this person) and thus giving the appearance of being idiographic, computergenerated interpretations do not describe the individual person who was examined. Instead, these interpretations describe test protocols, in the sense that they indicate what research findings or clinical wisdom say about people in general who show the kinds of test scores and patterns appearing in the protocol being scanned. Hence computer narratives are basically nomothetic, and most of them phrase at least some interpretive statements in terms of normative comparisons or even, as previously noted, specific frequencies with which the respondent’s test patterns occur in certain groups of people. However, because no two people are exactly alike and no one person matches any comparison group perfectly, some computer-generated interpretive statements may not describe an individual respondent accurately. For this reason, well-developed test software narratives include a caveat indicating that (a) the interpretive statements to follow describe groups of people, not necessarily the person who took the test; (b) misleading and erroneous statements may occur as a reflection of psychological characteristics or environmental circumstances unique to the person being examined and not widely shared within any normative group; and (c) other sources of information and the assessor’s judgment are necessary to determine which of the statements in an interpretive narrative apply to the respondent and which do not.
Fourth, the availability of computer-generated interpretive statements raises questions concerning their proper utilization in the preparation of an assessment report. Ideally, assessors should draw on computer narratives for some assistance, as for example in being sure that they have taken account of all of the relevant data, in checking for discrepancies between their own impressions and the inferences presented by the machine, and perhaps in getting some guidance on how best to organize and what to emphasize in their report. Less ideal is using CBTI not merely for supportive purposes but as a replacement for assessors’ being able and willing to generate their own interpretations of the measures they are using. Most of the assessment psychologists responding to the previously mentioned McMinn et al. (1999) survey reported that they never use CBTI as their primary resource for case formulation and would question the ethicality of doing so.
Even among ethical assessors, however, CBTI can present some temptations, because many computerized narratives present carefully crafted sentences and paragraphs that communicate clearly and lend themselves to being copied verbatim into a psychological report. Professional integrity would suggest that assessors relying on computer-generated conclusions should either express them in their own words or, if they are copying verbatim, should identify the copied material as a quotation and indicate its source. Beyond ethicality and integrity, unfortunately, the previously mentioned software accessibility that allows untrained persons to collect and score test protocols by machine also makes it possible for them to print out narrative interpretations and reproduce them fully or in part as a report, passing them off as their own work without any indication of source. Aside from representing questionable professional ethics, the verbatim inclusion of computer-generated interpretations in assessment reports is likely to be a source of confusion and error, because of the fact that these printouts are normatively rather than idiographically based and hence often include statements that are not applicable to the person being examined.
Malingering and Defensiveness
Malingering and defensiveness consist of conscious and deliberate attempts by persons being examined to falsify the information they are giving and thereby to mislead the examiner. Malingering involves intent to present oneself as being worse off psychologically than is actually the case and is commonly referred to as faking bad. Defensiveness involves seeking to convey an impression of being better off than one actually is and is commonly called faking good. Both faking bad and faking good can range in degree from slight exaggeration of problems and concerns or of assets and capabilities, to total fabrication of difficulties never experienced or accomplishments never achieved. These two types of efforts to mislead examiners arise from different kinds of motivation, but both of them can usually be detected from patterns of inconsistency that appear in the assessment data unless respondents have been carefully coached to avoid them.
Identifying Motivations to Mislead
People who fake bad during psychological assessments are usually motivated by some specific reason for wanting to appear less capable or more disturbed than they really are. In clinical settings, for example, patients who are concerned about not getting as much help or attention as they would like to receive may exaggerate or fabricate symptoms in order to convince a mental health professional that they should be taken into psychotherapy, that they should be seen more frequently if they are already in outpatient treatment, or that they should be admitted to an inpatient facility (or kept in a residential setting if they are already in one). In forensic settings, plaintiffs seeking damages in personal injury cases may malinger the extent of their neuropsychological or psychosocial impairments in hopes of increasing the amount of the settlement they receive, and defendants in criminal actions may malinger psychological disturbance in hopes of being able to minimize the penalties that will be imposed on them. In employment settings, claimants may malinger inability to function in order to begin or continue receiving disability payments or unemployment insurance.
People who fake good during psychological assessments, in an effort to appear more capable or better adjusted than they really are, also show a variety of motivations related to the setting in which they are being evaluated. Defensive patients in clinical settings may try to conceal the extent of their difficulties when they hope to be discharged from a hospital to which they were involuntarily committed, or when they would like to be told or have others told that they do not have any significant psychological problems for which they need treatment. In forensic settings, making the best possible impression can be a powerful inducement to faking good among divorced parents seeking custody of their children and among prison inmates requesting parole. In personnel settings, applicants for positions, candidates for promotion, and persons asking for reinstatement after having been found impaired have good reasons for putting their best foot forward during a psychological evaluation, even to the extent of overstating their assets and minimizing their limitations.
Detecting Malingering and Defensiveness
Attempts to mislead psychological assessors usually result in patterns of inconsistency that provide reliable clues to malingering and defensiveness. In the case of efforts to fake bad, these inconsistencies are likely to appear in three different forms. First, malingerers often produce inconsistent data within individual assessment measures. Usually referred to as intratest scatter, this form of inconsistency involves failing relatively easy items on intelligence or ability tests while succeeding on much more difficult items of the same kind, or responding within the normal range on some portions of a personality test but in an extremely deviant manner on other portions of the same test.
Asecond form of inconsistency frequently found in the assessment data of malingerers occurs between test results and the examiner’s behavioral observations. In some instances, for example, people who appear calm and relaxed during an interview, talk clearly and sensibly about a variety of matters, and conduct themselves in a socially appropriate fashion then produce test protocols similar to those seen in people who are extremely anxious or emotionally upset, incapable of thinking logically and coherently, out of touch with reality, and unable to participate comfortably in interpersonal relationships. Such discrepancies between test and interview data strongly suggest the deployment of deceptive tactics to create a false impression of disturbance.
The third form of inconsistency that proves helpful in detecting malingering consists of a sharp discrepancy between the interview and test data collected by the examiner and the respondent’sactualcircumstancesandpasthistoryasreported by collateral sources or recorded in formal documents. In these instances, the person being evaluated may talk and act strangely during an interview and give test responses strongly suggestive of serious psychological disturbance, but never previously have seen a mental health professional, received counseling or psychotherapy, been prescribed psychotropic medication, or been considered by friends, relatives, teachers, or employers to have any emotional problems. Such contrasts between serious impairments or limitations suggested by the results of an examination and a life history containing little or no evidence of these impairments or limitations provide good reason to suspect malingering.
Defensiveness in an effort to look good is similarly likely to result in inconsistencies in the assessment data that help to detect it. Most common in this regard are guarded test protocols and minimally informative interview responses that fall far short of reflecting a documented history of psychological disorder or problem behavior. Although being guarded and tight-lipped may successfully conceal difficulties, it also alerts examiners that a respondent is not being forthcoming and that the data being obtained probably do not paint a full picture of the person’s psychological problems and limitations. As another possibility, fake-good respondents may, instead of being guarded and closed-mouthed, become quite talkative and expansive in an effort to impress the examiner with their admirable qualities and many capabilities, in which case the assessment information becomes noteworthy for claims of knowledge, skills, virtues, and accomplishments that far exceed reasonable likelihood. These and other guidelines for the clinical detection of efforts to mislead assessors by faking either good or bad are elaborated by Berry, Wetter, and Baer (2002), McCann (1998, chapters 3–4), and Rogers (1997a).
Most self-report inventories include validity scales that are based on inconsistent and difficult-to-believe responses that can often help to identify malingering and defensiveness. (Greene, 1997). A variety of specific interview, self-report, and ability measures have also been developed along these lines to assist in identifying malingering, including the Structured Interview of Reported Symptoms (SIRS; Rogers, Gillis, Dickens, & Bagby, 1991; see also Rogers, 1997b), the M test for detecting efforts to malinger schizophrenia (Beaber, Marston, Michelli, & Mills, 1985; see also Smith, 1997), and the Test of Memory Malingering (TOMM; Tombaugh, 1997; see also Pankratz & Binder, 1997). Commonly used projective and other expressive measures do not include formal validity scales, but they are nevertheless quite sensitive to inconsistencies in performance that suggest malingering or defensiveness (Schretlen, 1997). Moreover, because relatively unstructured expressive measures convey much less meaning to respondents than self-report questionnaires concerning what their responses might signify, there is reason to believe that they may be less susceptible to impression management or even that the fakability of an assessment instrument is directly related to its face validity (Bornstein, Rossner, Hill, & Stepanian, 1994). This does not mean that unstructured measures like the Rorschach Inkblot Method and Thematic Apperception Test are impervious to malingering and defensiveness, which they are not, but only that efforts to mislead may be more obvious and less likely to convey a specific desired impression on these measures than on relatively structured measures.
A companion issue to the ease or difficulty of faking assessment measures is the extent to which respondents can be taught to deceive examiners with a convincingly goodlooking or bad-looking performance. Research findings indicate that even psychologically naive participants who are given some information about the nature of people with certain disorders or characteristics can shape their test behaviors to make themselves resemble a target group more closely than they would have without such instruction. Misleading results are even more likely to occur when respondents are coached specifically in how to answer certain kinds of questions and avoid elevating validity scales (Ben-Porath, 1994; Rogers, Gillis, Bagby, & Monteiro, 1991; Storm & Graham, 2000). The group findings in these research studies have not yet indicated whether a generally instructed or specifically coached respondent can totally mislead an experienced examiner in actual practice, without generating any suspicion that the obtained results may not be valid, and this remains a subject for further investigation.
With further respect to individual assessments in actual practice, however, there are reports in the literature of instances in which attorneys have coached their clients in how to answer questions on self-report inventories (e.g., LeesHaley, 1997; Wetter & Corrigan, 1995; Youngjohn, 1995), and a Web site available on the Internet claims to provide a list of supposed good and bad responses for each of the 10 Rorschach inkblots. As previously mentioned in discussing test security, prior knowledge of test questions and answers can detract from the practical utility of psychological assessment methods that feature right and wrong answers. The confounding effect of pretest information on unstructured measures, for which correct or preferable answers are difficult to specify out of context, may be minimal, but the susceptibility of these measures to successful deception by well-coached respondents is another topic for future research. Less uncertain are the questionable ethics of persons who coach test-takers in dishonesty and thereby thwart the legitimate purposes for which these respondents are being evaluated.
Integrating Data Sources
As noted at the beginning of this research paper, psychological assessment information can be derived from administering tests, conducting interviews, observing behavior, speaking with collateral persons, and reviewing historical documents. Effective integration of data obtained from such multiple sources calls for procedures based on the previously described additive, confirmatory, and complementary functions served by a multimethod test battery. In some instances, for example, a respondent may during an interview report a problem for which thereisnovalidtestindex(e.g.,havingbeensexuallyabused), and may demonstrate on testing a problem that is ordinarily not measured by interview data (e.g., poor perceptual-motor coordination). These two data sources can then be used additively to identify that the person has both a substance use disorder and a neuropsychological impairment. In another instance, a person who describes himself or herself during an interview as being a bright, well-educated individual with good leadership skills and a strong work ethic, and who then produces reliable documents attesting these same characteristics, offers assessors an opportunity for confirmatory use of these different data sources to lend certainty to a positive personnel report.
A third and somewhat more complicated set of circumstances may involve a respondent who behaves pleasantly and deferentially toward the assessor, reports being a kindly and even-tempered person, and produces limited and mostly conventional test responses that fall in the normal range. At the same time, however, the respondent is described by friends and relatives as a rageful and abusive person, and police reports show an arrest record for assault and domestic violence. Familiar to forensic psychologists consulting in the criminal justice system, this pattern of discrepant data can usually be explained by using them in a complementary fashion to infer defensiveness and a successful fake-good approach to the interviewing and testing situations. As a further example in educational settings, a student whose poor grades suggest limited intelligence but whose test performance indicates considerable intelligence gives assessors a basis for drawing in a complementary fashion on the divergent data to infer the likelihood of psychologically determined underachievement.
Because of the increased understanding of people that can accrue from integrating multiple sources of information, thorough psychological evaluation utilizes all of the available data during the interpretation phase of the assessment process. This consideration in conducting psychological assessments touches on the question of how much data should be collected in the first place. Theoretically, there can never be too much information in an assessment situation. There may be redundant information that provides more confirmatory evidence than is needed, and there may be irrelevant information that serves no additive function in answering the referral question, but examiners can choose to discard the former and ignore the latter. Moreover, all test, interview, and observational data that may be collected reflect some psychological characteristics of the person showing this behavior and therefore signify something potentially helpful to know about the person being assessed.
On the other hand, there are practical limits to how much assessment information should be collected to guide the formulation of interpretations. Above all, psychological assessors are responsible for conducting evaluations in a costeffective manner that provides adequate responses to referral questions with the least possible expense of time and money. As noted previously, practitioners who provide and charge for services that they know will make little difference are exploiting the recipients of their services and jeopardizing their own professional respectability. Assessment psychologists may differ in the amount and kind of data they regard as sufficient to conduct a fully adequate evaluation, but they generally recognize their ethical obligations to avoid going beyond what they genuinely believe will be helpful.
With further respect to providing answers to referral questions, two additional guidelines can help assessment psychologists in drawing wisely and constructively on the assessment data at their disposal. First, by taking full account of indications of both psychological strengths and weaknesses in people they examine, assessors can present a balanced description of their assets and liabilities. Psychological assessment has often addressed mainly what is wrong with people while giving insufficient attention to their adaptive capacities, positive potentials, and admirable qualities. In keeping with contemporary trends in psychology toward emphasizing wellness, happiness, optimism, and other positive features of the human condition (see Seligman & Csikszentmihalyi, 2000), assessment psychology serves its purposes best when the interpretive process gives full measure to adaptive capacities as well as functioning limitations.
Second, by recognizing that the inferences and impressions they derive from assessment data are likely to vary in the strength of the evidence supporting them, examiners can couch their interpretive statements in language that conveys their level of confidence in what they have to say. Most respondents provide clear and convincing evidence of at least some psychological characteristic, which examiners can then appropriately report in what may be called the language of certainty.The languageof certainty states in directterms what people are like and how they are likely to conduct themselves, as in saying, “This student has a marked reading disability,” or “Mr. A. appears to be an impulsive person with limited selfcontrol,” or “Ms. B. is an outgoing and gregarious person who seeks out and enjoys interpersonal relationships.” For other characteristics of a person being evaluated, the evidence may be fragmentary or suggestive rather than compelling and conclusive, in which case impressions are properly reported in what may be called the language of conjecture. Conjectural language suggests or speculates about possible features of a person’s nature or likely behavior, as in saying, “There is some evidence to suggest that this child may have an auditory processing deficit,” or “She occasionally shows tendencies to be inflexible in her approach to solving problems, which might limit the creativity of her decision-making as an executive,” or “The data provide some basis for speculating that his lack of effort represents a passive-aggressive way of dealing with underlying anger and resentment he feels toward people who have demanded a lot from him.”
Utilizing Assessment Information
The assessment process culminates in the utilization of descriptions of psychological characteristics and behavioral tendencies as a basis for formulating conclusions and recommendations. Interpretations or assessment information are now translated into their implications for various decisions, and the overall purpose and eventual goal of assessment can accordingly be conceived as a way of facilitating decision making about classification, selection, placement, diagnosis, and treatment of people being evaluated. In this output phase, however, account must be taken of the fact that assessment data and the descriptions to which they give rise may have different implications for different kinds of people living in different circumstances. Most important in this regard are possible sources of bias, applicable base rates, value judgments calling for cutting-score adjustments, and the cultural background and social context of the person being evaluated. Good assessment decisions depend on recognizing these considerations and preventing them from exerting undue influence on conclusions and recommendations.
Bias and Base Rates
Bias occurs in the utilization of assessment information when examiners allow preconceived notions and previously held beliefs to influence how they view the implications of their data. Assessment bias may arise either inadvertently, from attitudes of which examiners are unaware, or consciously, as purposeful intent on their part. Whether inadvertent or intentional, assessment bias takes the form of expectations that affect the meaning assigned to a set of findings, and most of these expectations originate in turn from demographic beliefs, environmental impressions, and epidemiological notions.
As an example of demographic beliefs, an assessor who thinks that older people and males are generally likely to perform better as managers than younger people and females may advise hiring a 45-year-old man and not hiring a 30-year-old woman for a managerial position, even if their psychological assessment information would be seen by most examiners as comparable or even favoring the female candidate. Similarly, an assessor who harbors a conviction that blue-collar African Americans are generally less likely to respond to psychotherapy than white-collar Caucasians may discourage psychotherapy for the former and recommend it for the latter, even when looking at assessment information showing equivalent treatment accessibility.
Environmental impressions as a source of biased expectations refer to the setting in which assessors are conducting an evaluation. Psychologists working in an inpatient facility in which a large percentage of patients are psychotically disturbed come to expect most of they people they examine to be psychotic, at least on admission, and they may accordingly be inclined to infer psychosis from a set of assessment data that would not have led them to this conclusion had they obtained it in an outpatient clinic in which psychotic disturbance is rarely seen. Similarly, psychologists assessing prison inmates, among whom antisocial personality disorder is commonly found, may be more likely to expect and diagnose this disorder than they would if they were working with similar data in a university counseling center.
As for epidemiological notions, examiners may be consciously or inadvertently influenced in the conclusions they draw by how they view the nature and incidence of various conditions. Those who believe that borderline personality disorder is widespread are likely to diagnose this condition more frequently than those who think this diagnostic category lacks precision and is used too frequently. Those who believe that attention-deficit/hyperactivity disorder (ADHD) occurs mainly in boys, and adolescent anorexia mainly in girls, are relatively unlikely to diagnose ADHD in girls and anorexia in boys.
In all such instances of possible influence derived from demographic, environmental, and epidemiological expectations, the challenge for assessment psychologists is to recognize their personal biases and prevent them as much as possible from exerting inappropriate influence on the conclusions and recommendations they derive from their assessment data. On the other hand, the previous examples were chosen to indicate that epidemiological and environmental expectations may have some basis in fact. There are more psychotic patients in hospital than in clinic populations, there are more antisocial individuals in prison than on college campuses, and there are substantial gender differences in the incidence of ADHD and anorexia. From a strictly actuarial point of view, then, being hospitalized does increase the probability of being psychotic, being incarcerated does increase the probability of being antisocial, and being male or female does increase the probability of being attention disordered or anorexic, respectively. Taking adequate account of such actual setting and group differences, while preventing them from resulting in biased conclusions, involves being alert to whatever base-rate information may be available in the individual case.
Base-rate information refers to the expected frequency of a characteristic or behavior in particular persons or circumstances. Attention to applicable base rates provides a way of estimating the utility of assessment procedures, particularly with respect to their efficiency in assessing rare events. As first identified by Meehl and Rosen (1955), base rates can become problematic for measuring instruments when the expected frequency of an event falls very far below 50%. For example, in a clinical setting in which 10% of the patients are suicidal, a valid test of suicidality that has a hit rate of 60% (i.e., is correct 60% of the time in identifying people in general as suicidal or nonsuicidal) is technically less efficient than simply calling all of the patients nonsuicidal, which would be correct 90% of the time.
Although technically correct from a psychometric perspective, this type of base-rate emphasis on efficiency does not always satisfy priorities in actual assessment practice. Assessment methods that are inefficient in assessing suicidality, given its low base rate even in most patient populations, may nevertheless correctly identify a subgroup of patients in whom suicidal behavior is relatively likely to occur. An examiner can then use this information to recommend suicide precautions for this apparently suicidal subgroup, which is preferable to overlooking the self-destructive potential of the high-risk group by exercising the technically more efficient option of calling all of the patients nonsuicidal.
The base-rate problem can also be minimized by focusing assessment efforts on restricted populations in which the expected frequency of the characteristic being evaluated is less rare than in the general population. Kamphuis and Finn (2002) note in this regard that the more closely a base rate approximates 50%, the better prospects a valid measure has of improving on the efficiency of concluding that either everyone or no one has a certain characteristic or behavioral tendency. As an example of increasing the base rate by restricting the population, efficient prediction of violent behavior among people in general is difficult to achieve, because most people are nonviolent. In a population of criminal offenders, however, many of whom have a history of violence, a valid measure of violence potential may prove quite efficient in identifying those at greatest risk for violent behavior in the future.
Value Judgments and Cutting Scores
Value judgments in the present context refers to the purposes for which a respondent is being evaluated in relation to the frequency of false-positive and false-negative outcomes that an assessment variable is likely to produce. False-positive outcomes result in decisions based on assuming that people have certain conditions and tendencies that they in fact do not, whereas false-negative outcomes result in inferring that people lack certain conditions and tendencies that in actuality do characterize them. When assessments are being conducted to assist in making decisions about psychological characteristics and their consequences that most people would regard as undesirable, like being suicidal or homicidal, false positives may be of less concern than false negatives. A false-positive decision concerning dangerousness might result in a person’s being unnecessarily supervised or even restrained, which is a regrettable but not a fatal outcome. Afalse-negative decision, on the other hand, by failing to identify dangerousness to oneself or others, can result in loss of life.
Conversely, false-positive outcomes may be more problematic than false-negative outcomes when referral questions concern desirable characteristics and consequences, like whether a person should be given a scholarship, a job, a promotion, or a parole. False negatives in this kind of assessment situation may result in denying people opportunities for which they are qualified and deserving, which is disadvantageous and perhaps unfair to them as individuals. However, when false positives result in promotion of personnel to positions of responsibility that exceed their capacities, or the parole of felons whose criminal tendencies have not abated, then many people other than the individual are likely to suffer serious consequences.
In relation to such value judgments, then, a set of assessment data may have different implications in difference assessment circumstances and thereby call for assessors to select carefully the cutting scores they utilize in formulating their conclusions and recommendations. For quantifiable dimensions of assessment that correlate positively with the presence of a characteristic or behavioral tendency, moving up the numerical scale produces a progressively decreasing percentage of false positives, and moving down the scale produces a progressively decreasing percentage of false negatives; just the opposite will be the case for assessment dimensions that are inversely correlated with what they measure. As a way of deciding the implications of assessment findings in a particular circumstance, cutting scores can thus be selected to minimize the likelihood of false-positive outcomes in examinations concerned with desirable consequences and minimize false-negative outcomes in the estimation of undesirable consequences.
Culture and Context
Just as assessment information may have different implications for people examined in different settings and for different purposes, it may also vary in its significance for respondents coming from different cultures or living in different social contexts. Hence the utilization phase of the assessment process must always take account of how characteristics of individuals identified in the interpretive phase are likely to affect their psychological functioning in their particular circumstances. Attention to cross-cultural influences has a long history in assessment psychology (see, e.g., Hallowell, 1951; Lindzey, 1961) and has seen a recent resurgence of interest, as described by Dana (1993, 2000b), Kazarian and Evans (1998), Suzuki, Ponterotto, and Meller (2000), and Williams, Satterwhite, and Saiz (1998).
The distinction drawn in this overview of the assessment process between interpreting and utilizing assessment information provides some useful guidelines for a two-step process in taking account of background and situational differences among respondents. The interpretive phase of assessment provides the first step, which consists of arriving at descriptive statements that identify a respondent’s psychological characteristics as they exist independently of his or her cultural context and circumstances. Having superior intelligence, being orderly and compulsive, experiencing memory loss, being emotionally reserved, having an assertive and competitive bent, and being prone to acute anxiety in unfamiliar surroundings are examples of characteristics that define the nature of the individual. As revealed by assessment data, such characteristics will be present in people regardless of where they live, from whence they come, and in what they are involved. The utilization phase of the assessment process provides the second step, which involves being sufficiently sensitive to respondents’ cultural and experiential contexts to estimate accurately the implications of their psychological characteristics in their particular life circumstances. Especially important in this regard is determining whether their psychological characteristics are likely to prove adaptive or maladaptive in their everyday world and what kinds of successful or unsuccessful adaptation might result from these characteristics in their particular circumstances.
Research findings document that cultural differences can lead to cross-cultural variation in modal psychological characteristics, and that the demands and expectations people face often determine the implications and consequences of particular characteristics, especially with respect to how adaptive they are (see Kazarian & Evans, 1998). For example, a generally passive, dependent, agreeable, and acquiescent person may be prone to adjustment difficulties in a cultural context that values autonomy, self-reliance, assertiveness, and competitiveness. Conversely, a fiercely independent and highly competitive person might feel comfortable and flourish psychologically in a subculture that values assertiveness, but might feel alienated and adapt poorly in a society that subordinates individual needs and pBibliography: to the wishes and welfare of the group, and in which a passive and acquiescent person would get along very well.
These contextual influences on the implications of psychological characteristics extend to specific circumstances in persons’ lives as well as their broader sociocultural contexts.A modest level of intelligence can be a source of comfort and success to young people whose personal and family expectations are simply that they graduate from high school, but a source of failure and dismay to those for whom graduation from a prestigious college is a minimum expectation. Similarly, a person with good coping skills and abundant adaptive capacities who is carrying a heavy burden of responsibilities and confronting numerous obstacles to meeting them may be susceptible to anxiety, irritability, and other consequences of a stress overload, whereas a person with limited coping skills and few adaptive capacities who is leading a narrowly restricted life involving very few demands may be able to maintain a comfortable psychological equilibrium and experience little in the way of subjectively felt distress. Likewise, a contemplative person who values being as careful as possible in completing tasks and arriving at conclusions may perform well in a job situation that calls for accuracy and thoroughness and involves relatively little time pressure, but may perform poorly in a position involving strict deadlines or requiring quick decisions on the basis of sketchy information, and in which a more decisive and action-oriented person would function more effectively.
As illustrated by the final example and those that have preceded it in this research paper, psychological assessment is a complex process. Diverse perspectives and attention to interacting variables are necessary in assessment psychology as elsewhere in behavioral science to expand knowledge and guide its practical application, and there is little to be gained from doctrinaire pronouncements of unidimensional approaches.
- Ackerman, M. J., & Ackerman, M. C. (1997). Custody evaluation practices: A survey of experienced professionals (revisited). Professional Psychology, 28, 137–145.
- Allard, G., & Faust, D. (2000). Errors in scoring objective personality tests. Assessment, 7, 119–129.
- Allen, J., & Walsh, J. A. (2000). A construct-based approach to equivalence: Methodologies for cross-cultural/multicultural personality assessment research. In R. H. Dana (Ed.), Handbook of cross-cultural and multicultural personality assessment (pp. 63–86). Mahwah, NJ: Erlbaum.
- Allport, G. W. (1937). Personality: A psychological interpretation. New York: Holt.
- American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
- American Psychological Association. (1992). Ethical principles of psychologists and code of conduct. American Psychologist, 47, 1597–1611.
- American Psychological Association. (2001). APA’s guidelines for test user qualifications. American Psychologist, 56, 1099–1113.
- Anastasi, A., & Urbina, S. (1997). Psychological testing (7th ed.). Englewood Cliffs, NJ: Prentice-Hall.
- Beaber, R. J., Marston, A., Michelli, J., & Mills, M. J. (1985). A brief test for measuring malingering in schizophrenic individuals. American Journal of Psychiatry, 142, 1478–1481.
- Beck, A. T., Steer, R. A., & Garbin, M. A. (1988). Psychometric properties of the Beck Depression Inventory: Twenty-five years of evaluation. Clinical Psychology Review, 8, 177–200.
- Ben-Porath, Y. S. (1994). The ethical dilemma of coached malingering research. Psychological Assessment, 6, 14–15.
- Berry, D. T. R., Wetter, M. W., & Baer, R. A. (2002). Assessment of malingering. In J. N. Butcher (Ed.), Clinical personality assessment (2nd ed., pp. 269–302). New York: Guilford.
- Blau, T. H. (1998). The psychologist as expert witness (2nd ed.). New York: Wiley.
- Bornstein, R. F., Rossner, S. C., Hill, E. L., & Stepanian, M. L. (1994). Face validity and fakability of objective and projective measures of dependency. Journal of Personality Assessment, 63, 363–386.
- Butcher, J. N. (2002). How to use computer-based reports. In J. N. Butcher (Ed.), Clinical personality assessment (2nd ed., pp. 109–125). New York: Oxford University Press.
- Butcher, J. N., Perry, J. N., & Atlis, M. M. (2000). Validity and utility of computer-based interpretations. Psychological Assessment, 12, 6–18.
- Butcher, J. N., & Rouse, S. V. (1996). Personality: Individual differences and clinical assessment. Annual Review of Psychology, 47, 87–111.
- Camara, W. J., Nathan, J. S., & Puente, A. E. (2000). Psychological test usage: Implications in professional psychology. Professional Psychology, 31, 141–154.
- Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81–105.
- Cattell, R. B. (1946). Description and measurement of personality. New York: World Book.
- Choca, J. P., Shanley, L. A., & Van Denberg, E. (1997). Interpretive guide to the Millon Clinical Multiaxial Inventory (2nd ed.). Washington, DC: American Psychological Association.
- Conoley, J. C., & Impara, J. (Eds.). (1995). The twelfth mental measurements yearbook. Lincoln: University of Nebraska Press.
- Dana, R. H. (1993). Multicultural assessment perspectives for professional psychology. Boston: Allyn and Bacon.
- Dana, R. H. (2000a). An assessment-intervention model for research and practice with multicultural populations. In R. H. Dana (Ed.), Handbook of cross-cultural and multicultural personality assessment (pp. 5–17). Mahwah, NJ: Erlbaum.
- Dana, R. H. (Ed.). (2000b). Handbook of cross-cultural and multicultural personality assessment. Mahwah, NJ: Erlbaum.
- De Bruyn, E. E. J. (1992). Anormative-prescriptive view on clinical psychodiagnostic decision making. European Journal of Psychological Assessment, 8, 163–171.
- Exner, J. E., Jr. (2003). The Rorschach: A comprehensive system. Vol. 1. Foundations (4th ed.). New York: Wiley.
- Fernandez-Ballesteros, R. (1997). Guidelines for the assessment process (GAP). European Psychologist, 2, 352–355.
- Finger, M. S., & Ones, D. S. (1999). Psychometric equivalence of the computer and booklet forms of the MMPI: A meta-analysis. Psychological Assessment, 11, 58–66.
- Finn, S. E. (1996). Assessment feedback integrating MMPI-2 and Rorschach findings. Journal of Personality Assessment, 67, 543– 557.
- Fischer, J., & Corcoran, K. J. (1994). Measures for clinical practice: A sourcebook (2nd ed., Vols. 1–2). New York: Macmillan.
- Ganellen, R. J. (1996). Integrating the Rorschach and the MMPI in personality assessment. Mahwah, NJ: Erlbaum.
- Garb, H. N. (1998). Studying the clinician. Washington, DC: American Psychological Association.
- Garb, H. N. (2000). Computers will become increasingly important for psychological assessment: Not that there’s anything wrong with that! Psychological Assessment, 12, 31–39.
- Garfield, S. L. (1994). Research on client variables in psychotherapy. In A. E. Bergin & S. L. Garfield (Eds.), Handbook of psychotherapy and behavior change (4th ed., pp. 190–228). New York: Wiley.
- Geisinger, K. F., & Carlson, J. F. (2002). Standards and standardization. In J. N. Butcher (Ed.), Clinical personality assessment (2nd ed., pp. 243–256). New York: Guilford.
- Graham, J. R. (2000). MMPI-2: Assessing personality and psychopathology (3rd ed.). New York: Oxford University Press.
- Greencavage, L. M., & Norcross, J. C. (1990). What are the commonalities among the therapeutic factors? Professional Psychology, 21, 372–378.
- Greene, R. L. (1997). Assessment of malingering and defensiveness by multiscale inventories. In R. Rogers (Ed.), Clinical assessment of malingering and deception (2nd ed., pp. 169–207). New York: Guilford.
- Greene, R. L. (2000). The MMPI-2: An interpretive manual (2nd ed.). Boston: Allyn and Bacon.
- Groth-Marnat, G. (1997). Handbook of psychological assessment (3rd ed.). New York: Wiley.
- Grove, W. M., & Meehl, P. E. (1996). Comparative efficiency of informal (subjective, impressionistic) and formal (mechanical, algorithmic) prediction procedures: The clinical-statistical controversy. Psychology, Public Policy, and Law, 2, 293–323.
- Grove, W. M., Zald, D. H., Lebow, B. S., Snitz, B. E., & Nelson, C. (2000). Clinical versus mechanical prediction: A meta-analysis. Psychological Assessment, 12, 19–30.
- Hallowell, A. I. (1951). The use of projective techniques in the study of the socio-psychological aspects of acculturation. Journal of Projective Techniques, 15, 27–44.
- Heilbrun, K. (2001). Principles of forensic mental health assessment. New York: Kluwer Academic/Plenum Publishers.
- Holland, J. L. (1985). Vocational Preference Inventory (VPI). Odessa, FL: Psychological Assessment Resources.
- Holt, R. R. (1958). Clinical and statistical prediction: A reformulation and some new data. Journal of Abnormal and Social Psychology, 56, 1–12.
- Holt, R. R. (1986). Clinical and statistical prediction: A retrospective and would-be integrative perspective. Journal of Personality Assessment, 50, 376–385.
- Honaker, L. M., & Fowler, R. D. (1990). Computer-assisted psychological assessment. In G. Goldstein & M. Hersen (Eds.), Handbook of psychological assessment (2nd ed., pp. 521–546). New York: Pergamon Press.
- Kamphuis, J. H., & Finn, S. E. (2002). Incorporating base rate information in daily clinical decision making. In J. N. Butcher (Ed.), Clinical personality assessment (2nd ed., pp. 257–268). New York: Oxford University Press.
- Karon, B. P. (2000). The clinical interpretation of the Thematic Apperception Test, Rorschach, and other clinical data: A reexamination of statistical versus clinical prediction. Professional Psychology, 31, 230–233.
- Kaufman, A. S. (1990). Assessing adolescent and adult intelligence. Boston: Allyn and Bacon.
- Kazarian, S., & Evans, D. R. (Eds.). (1998). Cultural clinical psychology. New York: Oxford University Press.
- Kitchener, K. S. (2000). Foundations of ethical practice, research, and teaching in psychology. Mahwah, NJ: Erlbaum.
- Kleinmuntz, B. (1990). Why we still use our heads instead of the formulas: Toward an integrative approach. Psychological Bulletin, 107, 296–310.
- Koocher, G. P., & Keith-Spiegel, P. (1998). Ethics in psychology. New York: Oxford University Press.
- Kubiszyn, T. W., Finn, S. E., Kay, G. G., Dies, R. R., Meyer, G. J., Eyde, L. D., et al. (2000). Empirical support for psychological assessment in clinical health care settings. Professional Psychology, 31, 119–130.
- Lees-Haley, P. R. (1997). Attorneys influence expert evidence in forensic psychological and neuropsychological cases. Assessment, 4, 321–324.
- Lindzey, G. (1961). Projective techniques and cross-cultural research. New York: Appleton-Century-Crofts.
- Maloney, M., & Ward, M. P. (1976). Psychological assessment: A conceptual approach. New York: Oxford University Press.
- Masling, J. M. (1966). Role-related behavior of the subject and psychologist and its effect upon psychological data. In D. Levine (Ed.), Nebraska symposium on motivation (pp. 67–104). Lincoln: University of Nebraska Press.
- Masling, J. M. (1998). Interpersonal and actuarial dimensions of projective testing. In L. Handler & M. J. Hilsenroth (Eds.), Teaching and learning personality assessment (pp. 119–135). Mahwah, NJ: Erlbaum.
- Matarazzo, J. D. (1990). Psychological assessment versus psychological testing. American Psychologist, 45, 999–1017.
- McCann, J. T. (1998). Malingering and deception in adolescents. Washington, DC: American Psychological Association.
- McFall, R. M., & Townsend, J. T. (1998). Foundations of psychological assessment: Implications for cognitive assessment in clinical science. Psychological Assessment, 10, 316–330.
- McFall, R. M., & Treat, T. A. (1999). Quantifying the information value of clinical assessments with signal detection theory. Annual Review of Psychology, 50, 215–241.
- McMinn, M. F., Ellens, B. M., & Soref, E. (1999). Ethical perspectives and practice behaviors involving computer-based test interpretations. Assessment, 6, 71–77.
- Meehl, P. E. (1954). Clinical versus statistical prediction. Minneapolis: University of Minnesota Press.
- Meehl, P. E. (1986). Causes and effects of my disturbing little book. Journal of Personality Assessment, 50, 370–375.
- Meehl, P. E., & Rosen, A. (1955). Antecedent probability and the efficiency of psychometric signs, patterns, or cutting scores. Psychological Bulletin, 52, 194–216.
- Meyer, G. J., Finn, S. E., Eyde, L. D., Kay, G. G., Moreland, K. L., Dies, R. R., et al. (2001). Psychological testing and psychological assessment: A review of evidence and issues. American Psychologist, 56, 128–165.
- Mohr, D. C. (1995). Negative outcome in psychotherapy. Clinical Psychology, 2, 1–27.
- Monahan, J. (Ed.). (1980). Who is the client? Washington, DC: American Psychological Association.
- Pankratz, L., & Binder, L. M. (1997). Malingering on intellectual and neuropsychological measures. In R. Rogers (Ed.), Clinical assessment of malingering and deception (2nd ed., pp. 223–236). New York: Guilford Press.
- Rogers, R. (1997a). Current status of clinical methods. In R. Rogers (Ed.), Clinical assessment of malingering and deception (2nd ed., pp. 373–397). New York: Guilford Press.
- Rogers, R. (1997b). Structured interviews and dissimulation. In R. Rogers (Ed.), Clinical assessment of malingering and deception (2nd ed., pp. 301–327). New York: Guilford Press.
- Rogers, R., Gillis, J. R., Bagby, R. M., & Monteiro, E. (1991). Detection of malingering on the Structured Interview of Reported Symptoms (SIRSP): A study of coached and uncoached simulators. Psychological Assessment, 3, 673–677.
- Rogers, R., Gillis, J. R., Dickens, S. E., & Bagby, R. M. (1991). Standardized assessment of malingering: Validation of the Structure Inventory of Reported Symptoms. Psychological Assessment, 3, 89–96.
- Sandoval, J., Frisby, C. L., Geisinger, K. F., & Scheuneman, J. D. (Eds.). (1990). Test interpretation and diversity. Washington, DC: American Psychological Association.
- Sawyer, J. (1965). Measurement and prediction, clinical and Psychological Bulletin, 66, 178–200.
- Schretlen, D. J. (1997). Dissimulation on the Rorschach and other projective measures. In R. Rogers (Ed.), Clinical assessment of malingering and deception (2nd ed., pp. 208–222). New York: Guilford Press.
- Seligman, M. E. P., & Csikszentmihalyi, M. (2000). Positive psychology: An introduction. American Psychologist, 55, 5–14.
- Shapiro, D. L. (1991). Forensic psychological assessment. Boston: Allyn and Bacon.
- Smith, G. P. (1997). Assessment of malingering with self-report measures. In R. Rogers (Ed.), Clinical assessment of malingering and deception (2nd ed., pp. 351–370). New York: Guilford Press.
- Snyder, D. K. (2000). Computer-assisted judgment: Defining strengths and liabilities. Psychological Assessment, 12, 52–60.
- Storm, J., & Graham, J. R. (2000). Detection of coached malingering on the MMPI-2. Psychological Assessment, 12, 158–165.
- Stricker, G. (1997). Are science and practice commensurable? American Psychologist, 52, 442–448.
- Suzuki, L. A., Ponterotto, J. G., & Meller, P. J. (Eds.). (2000). The handbook of multicultural assessment (2nd ed.). New York: Wiley.
- Swets, J. A., Dawes, R. M., & Monahan, J. (2000). Psychological science can improve diagnostic decisions. Psychological Science in the Public Interest, 1, 1–26.
- Tombaugh,T.N.(1997).TheTestofMemoryMalingering(TOMM): Normative data from cognitively intact and cognitively impaired individuals. Psychological Assessment, 9, 260–268.
- Watkins, C. E., Campbell, V. L., Nieberding, R., & Hallmark, R. (1995). Contemporary practice of psychological assessment by clinical psychologists. Professional Psychology, 26, 54–60.
- Weiner, I. B. (1989). On competence and ethicality in psychodiagnostic assessment. Journal of Personality Assessment, 53, 827–831.
- Weiner, I. B. (1998). Principles of Rorschach interpretation. Mahwah, NJ: Erlbaum.
- Weiner, I. B. (2002). How to anticipate ethical and legal challenges in personality assessments. In J. N. Butcher (Ed.), Clinical personality assessment (2nd ed., pp. 126–134). New York: Oxford University Press.
- Wetter, M. W., & Corrigan, S. K. (1995). Providing information to clients about psychological tests: A survey of attorneys’ and law students’ attitudes. Professional Psychology, 26, 474–477.
- Williams, J. E., Satterwhite, R. C., & Saiz, J. L. (1998). The importance of psychological traits: A cross-cultural study. New York: Plenum Press.
- Youngjohn, J. R. (1995). Confirmed attorney coaching prior to neuropsychological evaluation. Assessment, 2, 279–284.