View sample computerized psychological assessment research paper. Browse other research paper examples and check the list of psychology research paper topics for more inspiration. If you need a psychology research paper written according to all the academic standards, you can always turn to our experienced writers for help. This is how your paper can get an A! Feel free to contact our custom writing service for professional assistance. We offer high-quality assignments for reasonable rates.
Computers have become an integral part of modern life. No longer are they mysterious, giant electronic machines that are stuck away in some remote site at a university or government facility requiring a bunch of engineers with PhDs to operate. Computers are everywhere—doing tasks that were once considered to be sheer human drudgery (managing vast unthinkable inventories with lightening speed), happily managing chores that no one could accomplish (like monitoring intricate internal engine functions), or depositing a letter to a friend all the way around the world in microseconds, a task that used to take months.
Computers have served in several capacities in the field of psychological assessment since their introduction almost a half century ago, although initially only in the processing of psychological test information. Over the past several decades, their uses in mental health care settings have broadened, and computers have become important and necessary aids to assessment. The benefits of computers to the field of psychology continue to expand as technology becomes more advanced, allowing for more sophisticated operations, including integrative test interpretation, which once was the sole domain of humans. How can an electronic and nonintuitive gadget perform a complex cognitive process such as psychological test interpretation (which requires extensive knowledge, experience, and a modicum of intuition)?
The theoretical rationale underlying computer-based test interpretation was provided in 1954 when Meehl published a monograph in which he debated the merits of actuarial or statistical (objective) decision-making methods versus more subjective or clinical strategies. Meehl’s analysis of the relative strengths of actuarial prediction over clinical judgment led to the conclusion that decisions based upon objectively applied interpretive rules were ultimately more valid than judgments based on subjective strategies. Subsequently, Dawes, Faust, and Meehl (1989) and Grove and Meehl (1996) have reaffirmed the finding that objective assessment procedures are equal or superior to subjective methods. More recently, in a meta-analysis of 136 studies, Grove, Zald, Lebow, Smith, and Nelson (2000) concluded that the advantage in accuracy for statistical prediction over clinical prediction was approximately 10%.
In spite of the common foundations and comparable rationales that actuarial assessment and computerized assessment share, they are not strictly the same. Computer-based test interpretation (CBTI) can be either clinical or actuarial in foundation. It is an actuarial task only if its interpretive output is determined strictly by statistical rules that have been demonstrated empirically to exist between the input and the output data. A computer-based system for describing or predicting events that are not actuarial in nature might base its interpretations on the work of a clinician (or even an astrologer) who hypothesizes relationships using theory, practical experience, or even lunar phases and astrology charts.
It is important in the field of psychological assessment that the validity of computerized assessment instruments be demonstrated if they are to be relied upon for making crucial dispositions or decisions that can affect people. In 1984 the Committee on Professional Standards of the American Psychological Association (APA) cautioned psychologists who used interpretive reports in business and school settings against using computer-derived narrative test summaries in the absence of adequate data to validate their accuracy.
Ways Computers Are Used in Clinical Assessment
In the history of psychological assessment, the various computer-based test applications evolved differently. The relatively more routine tasks were initially implemented, and the applications of more complex tasks, such as interpretation, took several decades to become available.
Scoring and Data Analysis
The earliest computer-based applications of psychological tests involved scoring and data processing in research.Almost as soon as large mainframe computers became available for general use in the 1950s, researchers began to use them to process test development information. In the early days, data were input for scoring by key entry, paper tape, or cards. Today optical readers or scanners are used widely but not exclusively. It is also common to find procedures in which the respondent enters his or her responses directly into the machine using a keyboard. Allard, Butler, Faust, and Shea (1995) found that computer scoring was more reliable than manual scoring of test responses.
Profiling and Charting of Test Results
In the 1950s, some commercial services for scoring psychological tests for both research and clinical purposes emerged. These early services typically provided summary scores for the test protocols, and in some cases, they provided a profile graph with the appropriate levels of the scale elevation designated.Thetechnologyofcomputergraphicsofthetimedidnot allow for complex visual displays or graphing a profile by connecting the dots, and the practitioner needed to connect the dots manually to complete the profile.
Listing of Possible Interpretations
As computer use became more widespread, its potential advantage to the process of profiling of scores and assigning meaning to significantly elevated scores came to be recognized. A research group at Mayo Clinic in Rochester, Minnesota developed a computer program that actually provided rudimentary interpretations for the Minnesota Multiphasic Personality Inventory (MMPI) results of patients being seen at the hospital (Rome et al., 1962). The interpretive program was comprised of 110 statements or descriptions that were based on empirical correlates for particular MMPI scale elevations. The program simply listed out the most relevant statements for each client’s profile. This system was in use for many years to assess psychopathology of patients undergoing medical examinations at Mayo Clinic.
In 1963 Piotrowski completed a very elaborate computer program for Rorschach interpretation (Exner, 1987). The program was based on his own interpretive logic and included hundreds of parameters and rules. Because the program was too advanced for the computer technology available at that time, Piotrowski’s program never became very popular. However, it was a precursor of modern computer programs for calculating scores and indexes and generating interpretations of Rorschach data.
Evolution of More Complex Test Interpretation and Report Generation
It wasn’t long until others saw the broader potential in computer-based test interpretation. Fowler (1969) developed a computer program for the drug company, Hoffman-La Roche Laboratories, that not only interpreted the important scales of the MMPI but also combined the interpretive statements into a narrative report. Several other computerbased systems became available in the years that followed— for example, the Caldwell Report (Caldwell, 1996) and the Minnesota Report (Butcher, 1982).
Adapting the Administration of Test Items
Computer administration has been widely used as a means of obtaining response data from clients. This response format has many advantages over traditional manual processing methods—particularly the potential time savings, elimination of the possibility that respondents would make errors while filling out handwritten answer sheets, and elimination of the possibility that clinicians and technicians would make errors while hand-scoring items .
The flexibility of the computer offers the option of adapting the test to suit the needs and pBibliography: of the test taker. The administration of test items in a paper-and-pencil inventory requires that the test taker respond to each and every question regardless of whether it applies. Psychologists have been interested in modifying the administration of test items to fit the respondent—that is, to tailor a test administration to be analogous to an interview. For example, in an interview, if a question such as Are you married? is answered no, then all subsequent questions take this response into account and are branched away from seeking responses to items pertinent to being married. In other words, the items are administered in an adapted or tailored manner for the specific test taker. The comparability and validity of this method (known as computerized adaptive testing) have been explored in several studies (e.g., Butcher, Keller, & Bacon, 1985). Roper, Ben-Porath, and Butcher (1995) examined an adaptive version of the MMPI-2. Five hundred and seventy-one undergraduate psychology students were administered three versions of the MMPI-2: a booklet version, an adaptive computerized version, and a conventional computerized version. Each participant took the same format twice, took the booklet and adaptive computerized versions (in counterbalanced order), or took the conventional and adaptive computerized versions (again, in counterbalanced order). There were few statistically significant differences in the resulting mean scale scores between the booklet and adaptive computerized formats.
Decision Making by Computer
Available computer interpretation systems, even the most sophisticated report-generating programs, are essentially look up, list out programs—that is, they provide canned interpretations that have been stored in the computer to be called up when various test scores and indexes are obtained. The computer does not actually make decisions but simply follows instructions (often very complex and detailed ones) about the statements or paragraphs that are to be printed out. The use of computers to actually make decisions or simulate what the human brain does in making decisions—an activity that has been referred to as artificial intelligence—has not been fully accomplished in the assessment field. One program that comes closest to having the computer actually make the decisions is available in the Minnesota Personnel Screening Report (Butcher, 1995). In this system, the computer has been programmed with decision rules defining an array of test scores and decisions (e.g., manages stress well). The computer program determines the scores and indexes and then decides which of the summary variables are most appropriate for the range of scores obtained.
Butcher (1988) investigated the usefulness of this computer-based MMPI assessment strategy for screening in personnel settings. A group of 262 airline pilot applicants were evaluated by both expert clinicians and by computerbased decision rules. The overall level of adjustment of each applicant was rated by experts (using only an MMPI profile) on a Likert-type scale with three categories: adequate, problems possible, and problems likely. The computer-based decision rules were also used to make determinations about the applicants. Here, the categories of excellent, good, adequate, problems possible, and poor were used to classify the profiles. The results showed high agreement between the computer-based decisions and those made by clinicians in rating overall adjustment. Over 50% of individuals falling into the adequate category based on the computer-based rules were given ratings of adequate by the clinicians. There was agreement between the computer rules and clinician judgment on the possibility of problems being present in 26.7% of cases. Over 60% of individuals rated as poor by the computer rules were given problems likely ratings by the clinicians. This study indicated that there can be substantial agreement between clinicians and the computer when an objectively interpreted test is used. The study did not, however, provide information on the external validity of either approach because no criteria were available to allow for an assessment of the relative accuracy of either method.
Internet-Based Test Applications
Computer-based technological developments are advancing more rapidly than is the psychological technology to support psychological test usage on the Internet. The growth of the Internet and broadening commercial uses have increased the potential to administer, score, and interpret psychological tests online. Commercial test publishers have been receiving a great deal of pressure from test users to make more test-based services available on the Internet. The ethics of psychological test usage, standards of care, and the basic psychological test research have not kept up with the growth spurt of the Internet itself. Consequently, there are many unanswered questions as psychologists move into the twentyfirst century with the almost limitless potential of test applications facing the field. Later in this research paper, we address a number of these issues.
Equivalence of Computer-Administered Tests and Traditional Methods
Several authorities have raised questions about the equivalence of computer-based assessment methods and traditional psychological testing procedures. Hofer and Green (1985), for example, pointed out that there are several conditions related to computerized test administration that could produce noncomparable results. Some people might be uncomfortable with computers and feel awkward dealing with them; this would make the task of taking tests on a computer different from standard testing procedures. Moreover, factors such as the type of equipment used and the nature of the test material (i.e., when item content deals with sensitive and personal information) might make respondents less willing (or more willing) to reveal their true feelings to a computer than to a human being. These situations might lead to atypical results for computerized assessment compared to a traditional format. Another possible disadvantage of computer assessment is that computer-generated interpretations may be excessively general in scope and not specific enough for practical use. Finally, there is a potential for computer-based results to be misused because they might be viewed as more scientific than they actually are, simply because they came out of a computer (Butcher, 1987). It is therefore important that the issues of measurement comparability and, of course, validity of the interpretation be addressed. The next section addresses the comparability of computer-administered tests and paper-and-pencil measures or other traditional methods of data collection.
Comparability of Psychiatric Screening by Computer and Clinical Interview
Research has shown that clients in mental health settings report feeling comfortable with providing personal information through computer assessment (e.g., Hile & Adkins, 1997). Moreover, research has shown that computerized assessment programs were generally accurate in being able to diagnose the presence of behavioral problems. Ross, Swinson, Larkin, and Doumani (1994) used the Computerized Diagnostic Interview Schedule (C-DIS) and a clinician-administered Structural Clinical Interview for the Diagnostic and Statistical Manual of Mental Disorders–Third Edition–Revised (DSM-III-R; SCID) to evaluate 173 clients. They reported the congruence between the two instruments to be acceptable except for substance abuse disorders and antisocial personality disorder, in which the levels of agreement were poor. The C-DIS was able to rule out the possibility of comorbid disorders in the sample with approximately 90% accuracy.
Farrell, Camplair, and McCullough (1987) evaluated the capability of a computerized interview to identify the presence of target complaints in a clinical sample. Both a faceto-face, unstructured intake interview and the interview component of a computerized mental health information system, the Computerized Assessment System for Psychotherapy Evaluation and Research (CASPER), were administered to 103 adult clients seeking outpatient psychological treatment. Results showed relatively low agreement (mean r = .33) between target complaints as reported by clients on the computer and as identified by therapists in interviews. However, 9 of the 15 complaints identified in the computerized interview were found to be significantly associated with other self-report and therapist-generated measures of global functioning.
Comparability of Standard and Computer-Administered Questionnaires
The comparability of computer and standard administrations of questionnaires has been widely researched.Wilson, Genco, andYager (1985) used a test-attitudes screening instrument as a representative of paper-and-pencil tests that are administered also by computer. Ninety-eight female college freshman were administered the Test Attitude Battery (TAB) in both paper-and-pencil and computer-administered formats (with order of administration counterbalanced). The means and variances were found to be comparable for paper-and-pencil and computerized versions.
Holden and Hickman (1987) investigated computerized and paper-and-pencil versions of the Jenkins Activity Scale, a measure that assesses behaviors related to the Type A personality. Sixty male undergraduate students were assigned to one of the two administration formats. The stability of scale scores was comparable for both formats, as were mean scores, variances, reliabilities, and construct validities. Merten and Ruch (1996) examined the comparability of the German versions of the Eysenck Personality Questionnaire (EPQ-R) and the Carroll Rating Scale for Depression (CRS) by having people complete half of each instrument with a paper-and-pencil administration and the other half with computer administration (with order counterbalanced). They compared the results from the two formats to one another as well as to data from another sample, consisting of individuals who were administered only the paper-and-pencil version of the EPQ-R. As in the initial study, means and standard deviations were comparable across computerized and more traditional formats.
In a somewhat more complex and comprehensive evaluation of computer-based testing, Jemelka,Wiegand,Walker, and Trupin (1992) administered several computer-based measures to 100 incarcerated felons. The measures included brief mental health and personal history interviews, the group form of the MMPI, the Revised Beta IQ Examination, the Suicide Probability Scale, the Buss-Durkee Hostility Inventory, the Monroe Dyscontrol Scale, and the Veteran’s Alcohol Screening Test. From this initial sample, they developed algorithms from a CBTI system that were then used to assign to each participant rakings of potential for violence, substance abuse, suicide, and victimization. The algorithms were also used to describe and identify the presence of clinical diagnoses based on the DSMIII-R. Clinical interviewers then rated the felons on the same five dimensions. The researchers then tested a new sample of 109 participants with eight sections of the computer-based DIS and found the agreement between the CBTI ratings and the clinician ratings to be fair. In addition, there was also high agreement between CBTI- and clinician-diagnosed DSM-III-R disorders, with an overall concordance rate of 82%.
Most of the research concerning the comparability of computer-based and standard personality assessment measures has been with the MMPI or the MMPI-2. Several studies reported possible differences between paper-and-pencil and computerized testing formats (e.g., Lambert, Andrews, Rylee, & Skinner, 1987; Schuldberg, 1988; Watson, Juba, Anderson, & Manifold, 1990). Most of the studies suggest that the differences between administrative formats are few and generally of small magnitude, leading to between-forms correlations of .68–.94 (Watson et al., 1990). Moreover, some researchers have reported very high (Sukigara, 1996) or nearperfect (i.e., 92% to 97%) agreement in scores between computer and booklet administrations (Pinsoneault, 1996). Honaker, Harrell, and Buffaloe (1988) investigated the equivalency of a computer-based MMPI administration with the booklet version among 80 community volunteers. They found no significant differences in means or standard deviations between various computer formats for validity, clinical, and 27 additional scales. However, like a number of studies investigating the equivalency of computer and booklet forms of the MMPI, the power of their statistical analyses did not provide conclusive evidence regarding the equivalency of the paper-and-pencil and computerized administration format (Honaker et al., 1988).
The question of whether computer-administered and paper-and-pencil forms are equivalent was pretty much laid to rest by a comprehensive meta-analysis (Finger & Ones, 1999). Their analysis included 14 studies, all of which included computerized and standard formats of the MMPI or MMPI-2, that had been conducted between 1974 and 1996. They reported that the differences in T score means and standard deviations between test formats across the studies were negligible. Correlations between forms were consistently near 1.00. Based on these findings, the authors concluded that computer-administered inventories are comparable to booklet-administered forms.
The equivalence of conventional computerized and computer-adapted test administrations was demonstrated in the study cited earlier by Roper et al. (1995). In this study, comparing conventional computerized to adaptive computerized administrations of the MMPI, there were no significant differences for either men or women. In terms of criterionrelated validity, there were no significant differences between formats for the correlations between MMPI scores and criterion measures that included the Beck Depression Inventory, the Trait Anger and Trait Anxiety scales from the State-Trait Personality Inventory, and nine scales from the Symptoms Checklist—Revised.
Equivalence of Standard and Computer-Administered Neuropsychological Tests
Several investigators have studied computer-adapted versions of neuropsychological tests with somewhat mixed findings. Pellegrino, Hunt,Abate, and Farr (1987) compared a battery of 10 computerized tests of spatial abilities with these paper-andpencilcounterpartsandfoundthatcomputer-basedmeasuresof static spatial reasoning can supplement currently used paperand-pencil procedures. Choca and Morris (1992) compared a computerizedversionoftheHalsteadCategoryTesttothestandard version with a group of neurologically impaired persons and reported that the computer version was comparable to the original version.
However, some results have been found to be more mixed. French and Beaumont (1990) reported significantly lower scores on the computerized version than on the standard version of the Standard Progressive Matrices Test, indicating that these two measures cannot be used interchangeably. They concluded, however, that the poor resolution of available computer graphics might have accounted for the differences. With the advent of more sophisticated computer graphics, these problems are likely to be reduced in future studies. It should also be noted that more than a decade ago, French and Beaumont (1990) reported that research participants expressed a clear preference for the computer-based response format over the standard administration procedures for cognitive assessment instruments.
Equivalence of Computer-Based and Traditional Personnel Screening Methods
Several studies have evaluated computer assessment methods with traditional approaches in the field of personnel selection. Carretta (1989) examined the usefulness of the computerized Basic Attributes Battery (BAT) for selecting and classifying United StatesAir Force pilots. Atotal of 478 Air Force officer candidates completed a paper-and-pencil qualifying test and the BAT, and they were also judged based on undergraduate pilot training performance. The results demonstrated that the computer-based battery of tests was adequately assessing abilities and skills related to flight training performance, although the results obtained were variable.
In summary, research on the equivalence of computerized and standard administration has produced variable results. Standard and computerized versions of paper-and-pencil personality measures appear to be the most equivalent, and those involving more complex stimuli or highly different response or administration formats appear less equivalent. It is important for test users to ensure that a particular computer-based adaptation of a psychological test is equivalent before their results can be considered comparable to those of the original test (Hofer & Green, 1985).
Computer-Based Personality Narratives
Computer-based psychological interpretation systems usually provide a comprehensive interpretation of relevant test variables, along with scores, indexes, critical item responses, and so forth. The narrative report for a computer-based psychological test interpretation is often designed to read like a psychological report that has been prepared by a practitioner. However, psychological tests differ with respect to the amount of valid and reliable information available about them and consequently differ in terms of the time required to program the information into an effective interpretive system. Of course, if more research is available about a particular instrument, the more likely it is that the interpretations will be accurate. Instruments that have been widely researched, such as the MMPI and MMPI-2 (which have a research base of more than 10,000 articles) will likely have a more defensible interpretive system than a will test that has little or no research base. Test users need to be aware of the fact that some test interpretation systems that are commercially available are published with minimal established validity research. Simply being available commercially by computer does not assure test validity.
Steps in the Development of a Narrative Report
In developing a computer-based narrative report, the system developer typically follows several steps:
- Develops a systematic strategy for storing and retrieving relevant test information. This initial phase of development sets out the rationale and procedure for incorporating the published research findings into a coherent theme.
- Designs a computer program that scores the relevant scales and indexes and presents the information in a consistent and familiar form. This step may involve development of a program that accurately plots test profiles.
- Writes a dictionary of appropriate and validated test behaviors or correlates that can serve as the narrative data base. The test index definitions stored into memory can vary in complexity, ranging from discrete behaviors (e.g., if Scale 1 receives a T score greater than 70, print the following: Reports many physical symptoms) to extensive descriptors (e.g., if Scale 2 receives a T score greater than 65, then print the following: This client has obtained a significant scale elevation on the depression scale. It is likely that he is reporting extensive mental health symptoms including depression, worry, low self-esteem, low energy, feelings of inadequacy, lacking in self-confidence, social withdrawal, and a range of physical complaints). The dictionary of stored test information can be quite extensive, particularly if the test on which it is based has a broad research base. For example, a comprehensive MMPI-2 based interpretive system would likely include hundreds of pages of stored behavioral correlates.
- Specifies the interpretive algorithms for combining test indexes and dictionary text. This component of the interpretive system is the engine for combining the test indexes to use in particular reports and locating the appropriate dictionary text relevant for the particular case.
- Organizes the narrative report in a logical and user-friendly format. Determines what information is available in the test being interpreted and organizes the information into a structure that maximizes the computer-generated hypotheses.
- Tests the system extensively before it is offered to the public. This may involve generating sample reports that test the system with a broad range of possible test scores and indexes.
- Eliminates internal contradictions within the system. This phase involves examining a broad range of reports on clients with known characteristics in order to modify the program to prevent contradictory or incorrect statements from appearing in the narrative.
- Revises the system periodically to take into account new research on the test instrument.
Responsibilities of Users of Computer-Based Reports
As Butcher (1987, 1995; Butcher et al., 1985) has discussed, there are definite responsibilities that users of computerbased psychological reports assume, and these responsibilities are especially important when the reports are used in forensic evaluations:
- It is important to ensure that appropriate custody of answer sheets and generated test materials be maintained (i.e., kept in a secure place). Practitioners should see to it that the client’s test materials are properly labeled and securely stored so that records can be identified if circumstances call for recovery at a later date—for example, in a court case.
- The practitioner should closely follow computer-based validity interpretations because clients in both clinical and forensic cases may have motivation to distort their answers in order to present a particular pattern in the evaluation.
- It is up to the practitioner to ensure that there is an appropriate match between the prototypal report generated by the computer and background and other test information available about a particular client. Does the narrative report match the test scores generated by the scoring program? Please refer to the note at the end of the sample computerized narrative report presented in the appendix to this research paper. It is customary for reports to contain language that stresses the importance of the practitioner, making sure that the case matches the report.
- The practitioner must integrate congruent information from the client’s background and other sources into evaluation based on test results. Computer-based reports are by necessity general personality or clinical descriptions based on prototypes.
- It is the responsibility of the practitioner using computerbased test interpretations to account for any possible discrepancies between the report and other client data.
Illustration of a Computer-Based Narrative Report
Although the output of various interpretive systems can vary from one service to another or from one test to another, the Minnesota Report for the MMPI-2 offers a fairly representative example of what one might expect when using computerized interpretation services. The MMPI-2 responses for the case of Della B. were submitted to National Computer Systems, and the resulting report is presented in the appendix to this research paper.
Della, a 22-year-old woman, was evaluated by a forensic psychologist at the request of her attorney. She and her husband had been charged with the murder of their 16-month-old child. Della, who was 5 months pregnant at the time of the evaluation, had been living with her husband and daughter in a small apartment.
About 2 months before the death of her daughter, the parents were investigated by the county protection agency for possible child abuse or neglect after a neighbor had reported to the authorities that their apartment was a shambles and that the child appeared to be neglected. The neighbors reported that the couple kept the child in a small room in the apartment along with cages for the parents’ four rabbits, which were allowed to run free around the room most of the time. The parents also kept two Russian wolfhounds in their living room. The family periodically volunteered to take care of animals for the animal recovery shelter, and on two previous occasions the animals died mysterious deaths while in the family’s care. Although the house was found to be in shambles, the child protection worker did not believe that the child was endangered and recommended that the parents retain custody. The family apparently lived a very chaotic life. Della and her husband drank heavily almost every day and argued almost constantly. The day that their daughter died, the couple had been drinking and arguing loudly enough for the neighbors to hear. Della reported her daughter’s death through a 911 call indicating that the child had apparently suffocated when she became trapped between her bed and the wall. After a police investigation, however, both parents were charged with homicide because of the extensive bruises on the child’s body. During the pretrial investigation (and after Della’s second child was born), her husband confessed to killing his daughter to allow his wife to go free. He was sentenced to 18 years in prison. Although there was much evidence to indicate Della’s complicity in the killing, she was released from custody after serving a 5-month sentence for conspiracy and rendering false statements.
Validity Research on Computerized Narrative Reports
Interpretive reports generated by computer-based psychological assessment systems need to have demonstrated validity even if the instruments on which the interpretations are based are supported by research literature. Computerized outputs are typically one step removed from the test index-validity data relationships from the original test; therefore, it is important to demonstrate that the inferences included in the computerized report are reliable and valid in the settings where they are used. Some computer interpretation programs now in use also provide comprehensive personality assessment by combining test findings into narrative descriptions and conclusions. Butcher, Perry, andAtlis (2000) recently reviewed the extensive validity research for computer-based interpretation systems. Highlights from their evaluation are summarized in the following sections.
In discussing computer-based assessment, it is useful to subdivide computerized reports into two broad categories: descriptive summaries and consultative reports. Descriptive summaries (e.g., for the 16 Personality Factor Test or 16PF) are usually on a scale-by-scale basis without integration of the results into a narrative. Consultative reports (e.g., those for the MMPI-2 and DTREE, a computer-based DSM-IV diagnostic program) provide detailed analysis of the test data and emulate as closely as possible the interpretive strategies of a trained human consultant.
Narrative Reports in Personality Assessment
The validity of computerized reports has been extensively studied in both personality testing and psychiatric screening (computer-based diagnostic interviewing). Research aimed at exploring the accuracy of narrative reports has been conducted for several computerized personality tests, such as the Rorschach Inkblot Test (e.g., Harris, Niedner, Feldman, Fink, & Johnson, 1981; Prince & Guastello, 1990), the 16PF (e.g., Guastello & Rieke, 1990; O’Dell, 1972), the Marital Satisfaction Questionnaire (Hoover & Snyder, 1991) and the Millon Clinical Multiaxial Inventory (MCMI; Moreland & Onstad, 1987; Rogers, Salekin, & Sewell, 1999). Moreland (1987) surveyed results from the most widely studied computer-based personality assessment instrument, the MMPI. Evaluation of diagnostic interview screening by computer (e.g., the DIS) has also been reported (First, 1994).
Moreland (1987) provided an overview of studies that investigated the accuracy of computer-generated MMPI narrative reports. Some studies compared computer-generated narrative interpretations with evaluations provided by human interpreters. One methodological limitation of this type of study is that the clinician’s interpretation might not be valid and accurate (Moreland, 1987). For example, Labeck, Johnson, and Harris (1983) asked three clinicians (each with at least 12 years of clinical experience) to rate the quality and the accuracy of code-type interpretations generated by an automated MMPI program (the clinicians did not rate the fit of a narrative to a particular patient, however). Results indicated that the MMPI code-type, diagnostic, and overall profile interpretive statements were consistently rated by the expert judges as strong interpretations. The narratives provided by automated MMPI programs were judged to be substantially better than average when compared to the blind interpretations of similar profiles that were produced by the expert clinicians. The researchers, however, did not specify how they judged the quality of the blind interpretation and did not investigate the possibility that statements in the blind interpretation could have been so brief and general (especially when compared to a two-page narrative CBTI) that they could have artificially inflated the ratings of the CBTI reports. In spite of these limitations, this research design was considered useful in evaluating the overall congruence of computer-generated decision and interpretation rules.
Shores and Carstairs (1998) evaluated the effectiveness of the Minnesota Report in detecting faking. They found that the computer-based reports detected fake-bad profiles in 100% of the cases and detected fake-good profiles in 94% of the cases.
The primary way researchers have attempted to determine the accuracy of computer-based tests is through the use of raters (usually clinicians) who judge the accuracy of computer interpretations based on their knowledge of the client (Moreland, 1987). For example, a study by Butcher and colleagues (1998) explored the utility of computer-based MMPI-2 reports in Australia, France, Norway, and the United States. In all four countries, clinicians administered the MMPI-2 to their patients being seen for psychological evaluation or therapy; they a booklet format in the language of each country. The tests were scored and interpreted by the Minnesota Report using the American norms for MMPI-2. The practitioner, familiar with the client, rated the information available in each narrative section as insufficient, some, adequate, more than adequate, or extensive. In each case, the clinicians also indicated the percentage of accurate descriptions of the patient and were asked to respond to open-ended questions regarding ways to improve the report. Relatively few raters found the reports inappropriate or inaccurate. In all four countries, the Validity Considerations, Symptomatic Patterns, and Interpersonal Relations sections of the Minnesota Report were found to be the most useful sections in providing detailed information about the patients, compared with the Diagnostic Considerations section. Over two thirds of the records were considered to be highly accurate, which indicated that clinicians judged 80–100% of the computer-generated narrative statements in them to be appropriate and relevant. Overall, in 87% of the reports, at least 60% of the computer-generated narrative statements were believed to be appropriate and relevant to understanding the client’s clinical picture.
Although such field studies are valuable in examining the potential usefulness of computer-based reports for various applications, there are limitations to their generalizability. Moreland concluded that this type of study has limitations, in part because estimates of interrater reliability are usually not practical. Raters usually are not asked to provide descriptions of how their judgments were made, and the appropriateness of their judgments was not verified with information from the patients themselves and from other sources (e.g., physicians or family members). Moreland (1985) suggested that in assessing the validity of computer-generated narrative reports, raters should evaluate individual interpretive statements because global accuracy ratings may limit the usefulness of ratings in developing the CBTI system.
Eyde, Kowal, and Fishburne (1991) followed Moreland’s recommendations in a study that investigated the comparative validity of the narrative outputs for several CBTI systems. They used case histories and self-report questionnaires as criteria against which narrative reports obtained from seven MMPI computer interpretation systems could be evaluated. Each of the clinicians rated six protocols. Some of the cases were assigned to all raters; they consisted of an African American patient and a Caucasian patient who were matched for a 7-2 (Psychasthenia-Depression) code-type and an African American soldier and a Caucasian soldier who had all clinical scales in the subclinical range (T < 70). The clinicians rated the relevance of each sentence presented in the narrative CBTI as well as the global accuracy of each report. Some CBTI systems studied showed a high degree of accuracy (The Minnesota Report was found to be most accurate of the seven). However, the overall results indicated that the validity of the narrative outputs varied, with the highest accuracy ratings being associated with narrative lengths in the short-to-medium range. The longer reports tended to include less accurate statements. For different CBTI systems, results for both sentence-by-sentence and global ratings were consistent, but they differed for the clinical and subclinical normal profiles. The subclinical normal cases had a high percentage (Mdn = 50%) of unratable sentences, and the 7-2 profiles had a low percentage (Mdn = 14%) of sentences that could not be rated. One explanation for such differences may come from the fact that the clinical cases were inpatients for whom more detailed case histories were available. Because the length of time between the preparation of the case histories and the administrations of the MMPI varied from case to case, it was not possible to control for changes that a patient might have experienced over time or as a result of treatment.
One possible limitation of the published accuracy-rating studies is that it is usually not possible to control for a phenomenon referred to as the P. T. Barnum effect (e.g., Meehl, 1956) or Aunt Fanny effect (e.g., Tallent, 1958), which suggests that a narrative report may contain high base-rate descriptions that apply to virtually anybody. One factor to consider is that personality variables, such as extraversion, introversion, and neuroticism (Furnham, 1989), as well as the extent of private self-consciousness (Davies, 1997), also have been found to be connected to individuals’ acceptance of Barnum feedback.
Research on the Barnum rating effect has shown that participants can usually detect the nature of the overly general feedback if asked the appropriate questions about it (Furnham & Schofield, 1987; Layne, 1979). However, this criticism might not be appropriate for clinical studies because this research has most often been demonstrated for situations involving acceptance of positive statements in self-ratings in normally functioning individuals. For example, research also has demonstrated that people typically are more accepting of favorable Barnum feedback than they are of unfavorable feedback (Dickson & Kelly, 1985; Furnham & Schofield, 1987; C. R. Snyder & Newburg, 1981), and people have been found to perceive favorable descriptions as more appropriate for themselves than for people in general (Baillargeon & Danis, 1984).
Dickson and Kelly (1985) suggested that test situations, such as the type of assessment instruments used, can be significant in eliciting acceptance of Barnum statements. However, Baillargeon and Danis (1984) found no interaction between the type of assessment device and the favorability of statements. Research has suggested that people are more likely to accept Barnum descriptions that are presented by persons of authority or expertise (Lees-Haley, Williams, & Brown, 1993). However, the relevance of this interpretation to studies of testing results has been debated.
Some researchers have made efforts to control for Barnum-type effects on narrative CBTIs by comparing the accuracy of ratings to a stereotypical client or an average subject and by using multireport-multirating intercorrelation matrices (Moreland, 1987) or by examining differences in perceived accuracy between bogus and real reports (Moreland & Onstad, 1987; O’Dell, 1972). Several studies have compared bogus with genuine reports and found them to be statistically different in judged accuracy. In one study, for example, Guastello, Guastello, and Craft (1989) asked college students to complete the Comprehensive Personality Profile Compatibility Questionnaire (CPPCQ). One group of students rated the real computerized test interpretation of the CPPCQ, and another group rated a bogus report. The difference between the accuracy ratings for the bogus and real profiles (57.9% and 74.5%, respectively) was statistically significant. In another study (Guastello & Rieke, 1990), undergraduate students enrolled in an industrial psychology class evaluated a real computer-generated Human Resources Development Report (HRDR) of the 16PF and a bogus report generated from the average 16PF profile of the entire class. Results indicated no statistically significant difference between the ratings for the real reports and the bogus reports (which had mean accuracy ratings of 71.3% and 71.1%, respectively). However, when the results were analyzed separately, four out of five sections of the real 16PF output had significantly higher accuracy ratings than did the bogus report. Contrary to these findings, Prince and Guastello (1990) found no statistically significant differences between descriptions of a bogus and real CBTI interpretations when they investigated a computerized version of the Exner Rorschach interpretation system.
Moreland and Onstad (1987) asked clinical psychologists to rate genuine MCMI computer-generated reports and randomly generated reports. The judges rated the accuracy of the reports based on their knowledge of the client as a whole as well as the global accuracy of each section of the report. Five out of seven sections of the report exceeded chance accuracy when considered one at a time. Axis I and Axis II sections demonstrated the highest incremental validity. There was no difference in accuracy between the real reports and the randomly selected reports for the Axis IV psychosocial stressors section. The overall pattern of findings indicated that computer reports based on the MCMI can exceed chance accuracy in diagnosing patients (Moreland & Onstad, 1987, 1989).
Overall, research concerning computer-generated narrative reports for personality assessment has typically found that the interpretive statements contained in them are comparable to clinician-generated statements. Research also points to the importance of controlling for the degree of generality of the reports’ descriptions in order to reduce the confounding influence of the Barnum effect (Butcher et al., 2000).
Computer-based test batteries have also been used in making assessment decisions for cognitive evaluation and in neuropsychological evaluations. The 1960s marked the beginning of investigations into the applicability of computerized testing to this field (e.g., Knights & Watson, 1968). Because of the inclusion of complex visual stimuli and the requirement that participants perform motor response tasks, the computer development of computerized assessment of cognitive tasks has not proceeded as rapidly as that of paper-and-pencil personality measures. Therefore, neuropsychology computerized test interpretation was slower to develop procedures that are equal in accuracy to those achieved by human clinicians (Adams & Heaton, 1985, p. 790; see also Golden, 1987). Garb and Schramke (1996) reviewed and performed a meta-analysis of studies involving computer analyses for neuropsychological assessment, concluding that they were promising but that they needed improvement. Specifically, they pointed out that programs needed to be created that included such information as patient history and clinician observation in addition to the psychometric and demographic data that are more typically used in the prediction process for cognitive measures.
Russell (1995) concluded that computerized testing procedures were capable of aiding in the detection and location of brain damage accurately but not as precisely as clinical judgment. For example, the Right Hemisphere Dysfunction Test (RHDT) and Visual Perception Test (VPT) were used in one study (Sips, Catsman-Berrevoets, van Dongen, van der Werff, & Brook, 1994) in which these computerized measures were created for the purpose of assessing right-hemisphere dysfunction in children and were intended to have the same validity as the Line Orientation Test (LOT) and Facial Recognition Test (FRT) had for adults. Fourteen children with acquired cerebral lesions were administered all four tests. Findings indicated that the computerized RHDT and VPT together were sensitive (at a level of 89%) to right-hemisphere lesions, had relatively low specificity (40%), had high predictive value (72%), and accurately located the lesion in 71% of cases. Fray, Robbins, and Sahakian (1996) reviewed findings regarding a computerized assessment program, the Cambridge Neuropsychological Test Automated Batteries (CANTAB). Although specificity and sensitivity were not reported, the reviewers concluded that CANTAB could detect the effects of progressive, neurogenerative disorders sometimes before other signs manifested themselves. They concluded that the CANTAB has been found successful in detecting early signs of Alzheimer’s, Parkinson’s, and Huntington’s diseases.
Evaluation of Computerized Structured Interviews
Research on computer-assisted psychiatric screening has largely involved the development of logic-tree decision models to assist the clinician in arriving at clinical diagnoses (Erdman, Klein, & Greist, 1985). Logic-tree systems are designed to establish the presence of symptoms specified in diagnostic criteria and to arrive at a particular diagnosis (First, 1994). For example, the DTREE is a recent program designed to guide the clinician through the diagnostic process (First, 1994) and provide the clinician with diagnostic consultation both during and after the assessment process. A narrative report is provided that includes likely diagnoses as well as an extensive narrative explaining the reasoning behind diagnostic decisions included. Research on the validity of logic-tree programs typically compares diagnostic decisions made by a computer and diagnostic decisions made by clinicians. In an initial evaluation, First et al. (1993) evaluated the use of DTREE in an inpatient setting by comparing case conclusions by expert clinicians with the results of DTREE output.
Psychiatric inpatients (N = 20) were evaluated by a consensus case conference and by their treating psychiatrist (five psychiatrists participated in the rating) who used DTREE software. Although the number of cases within each of the diagnostic categories was small, the results are informative. On the primary diagnosis, perfect agreement was reached between the DTREE and the consensus case conference in 75% of cases (N = 15). The agreement was likely to be inflated because some of the treating psychiatrists participated in both the DTREE evaluation and the consensus case conference. This preliminary analysis, however, suggested that DTREE might be useful in education and in evaluation of diagnostically challenging clients (First et al., 1993), although the amount of rigorous research on the system is limited.
A second logic-tree program in use is a computerized version of the World Health Organization (WHO) Composite International Diagnostic Interview (CIDI-Auto). Peters and Andrews (1995) conducted an investigation of the validity of the CIDI-Auto in the DSM-III-R diagnoses of anxiety disorders, finding generally variable results ranging from low to high accuracy for the CIDI-auto administered by computer. However, there was only modest overall agreement for the procedure. Ninety-eight patients were interviewed by the first clinician in a brief clinical intake interview prior to entering an anxiety disorders clinic. When the patients returned for a second session, a CIDI-Auto was administered and the client was interviewed by another clinician. The order in which CIDI-Auto was completed varied depending upon the availability of the computer and the second clinician. At the end of treatment, clinicians reached consensus about the diagnosis in each individual case ( = .93). When such agreement could not be reached, diagnoses were not recorded as the LEAD standard against which CIDI-Auto results were evaluated. Peters and Andrews (1995) concluded that the overdiagnosis provided by the CIDI might have been caused by clinicians’using stricter diagnostic rules in the application of duration criteria for symptoms.
In another study, 37 psychiatric inpatients completed a structured computerized interview assessing their psychiatric history (Carr, Ghosh, & Ancill, 1983). The computerized interview agreed with the case records and clinician interview on 90% of the information. Most patients (88%) considered computer interview to be no more demanding than a traditional interview, and about one third of them reported that the computer interview was easier. Some patients felt that their responses to the computer were more accurate than those provided to interviewers. The computer program in this study elicited about 9.5% more information than did traditional interviews.
Psychiatric screening research has more frequently involved evaluating computer-administered versions of the DIS (Blouin, Perez, & Blouin, 1988; Erdman et al., 1992; Greist et al., 1987; Mathisen, Evans, & Meyers, 1987; Wyndowe, 1987). Research has shown that in general, patients tend to hold favorable attitudes toward computerized DIS systems, although diagnostic validity and reliability are questioned when such programs are used alone (First, 1994).
Past Limitations and Unfulfilled Dreams
So far I have explored the development of computer-based assessment strategies for clinical decision making, described how narrative programs are developed, and examined their equivalence and accuracy or validity. In this section I provide a summary of limitations of computer-based assessment and indicate some directions that further studies will likely or should go.
Computer-based testing services have not maximally incorporated the flexibility and graphic capabilities in presenting test-based stimuli. Psychologists have not used to a great degree the extensive powers of the computer in presenting stimuli to test takers. Much could be learned from the computer game industry about presenting items in an interesting manner. With the power, graphic capability, and flexibility of the computer, it is possible to develop more sophisticated, real-world stimulus environments than are currently available in computer-administered methods. For example, the test taker might be presented with a virtual environment and be asked to respond appropriately to the circumstances presented.
It is likely that assessment will improve in quality and effectiveness as technology—particularly graphic displays and voice-activated systems—improves in quality. At the present time, the technology exists for computer-based assessment of some complex motor activities, but they are extremely expensive to develop and maintain. For example, airlines use complex flight simulators that mimic the flight environment extremely well. Similar procedures could be employed in the assessment of cognitive functioning; however, the psychotechnology is lacking for developing more sophisticated uses. The computer assessment field has not kept up with the electronic technology that allows developing test administration strategies along the lines of the virtual reality environment. A great deal more could be done in this area to provide more realistic and interesting stimulus situations to test takers. At present, stimulus presentation of personality test items simply follows the printed booklet form. Astatement is printed on the screen and the client simply presses a yes or no key. Many response behaviors that are important to test interpretation are not incorporated in computer-based interpretation at present (e.g., stress-oriented speech patterns, facial expressions, or the behavior of the client during testing). Further advancements from the test development side need to come to fruition in order to take full advantage of the present and future computer technology.
Computer-based reports are not stand-alone clinical evaluations. Even after almost 40 years of development, most computer-based interpretive reports are still considered to be broad, generic descriptions rather than integrated, standalone psychological reports. Computer-generated reports should be considered as potentially valuable adjuncts to clinical judgment rather than stand-alone assessments that are used in lieu of an evaluation of a skilled clinician (Fowler, 1969). The reports are essentially listings of the most likely test interpretations for a particular set of test scores—an electronic dictionary of interpretive hypotheses that have been stored in the computer to be called out when those variables are obtained for a client.
Many people would not, however, consider this feature to be a limitation of the computer-based system but actually prefer this more limited role as the major goal rather than development of final-product reports for an instrument that emerge from the computer. There has not been a clamoring in the field for computer-based finished-product psychological reports.
Computer-based assessment systems often fail to take into consideration client uniqueness. Matarazzo (1986) criticized computerized test interpretation systems because of their seeming failure to recognize the uniqueness of the test takers—that is, computer-based reports are often amorphous descriptions of clients that do not tap the uniqueness of the individual’s personality.
It is true that computer-based reports seem to read a lot alike when one sees a number of them for different patients in a particular setting. This sense of sameness results from two sources. First, computerized reports are the most general summaries for a particular test score pattern and do not contain much in the way of low-frequency and specifically tailored information. Second, it is natural for reports to contain similar language because patients in a particular setting are alike when it comes to describing their personality and symptoms. For example, patients in a chronic pain program tend to cluster into four or five MMPI-2 profile types—representing a few scales, Hypochondriasis (Hs), Hysteria (Hy), Depression (D), and Psychasthenia (Pt; Keller & Butcher, 1991). Patients seen in an alcohol treatment setting tend to cluster into about four clusters, usually showing Paranoid (Pd), D, Pt, and Hypomania (Ma). Reports across different settings are more recognizably different. It should be noted that attempting to tailor test results to unique individual characteristics is a complex process and may not always increase their validity because it is then necessary to include low base rate or rare hypotheses into the statement library.
The use of computer-based reports in clinical practice might dilute responsibility in the psychological assessment. Matarazzo (1986) pointed out that the practice of having unsigned computer-based reports creates aproblem—a failure of responsibility for the diagnostic evaluation. According to Matarazzo, no one feels directly accountable for the contents of the reports when they come from a computer. In most situations today, this is not considered a problem because computer-based narrative reports are clearly labeled professional-to-professional consultations. The practitioner chooses to (or not to) incorporate the information from the report into his or her own signed evaluation report. Computerbased reports are presented as likely relevant hypotheses and labeled as consultations; they are not sold as stand-alone assessment evaluations. In this way, computerized interpretation systems are analogous to electronic textbooks or reference works: They provide a convenient lookup service. They are not finished products.
Computer-based reporting services do not maximally use the vast powers of the computer in integrating test results from different sources. It is conceptually feasible to developing an integrated diagnostic report—one that incorporates such elements or components as
- Behavioral observations.
- Personal history.
- Personality data from an omnibus personality measure such as the MMPI-2.
- Intellectual-cognitive abilities such as those reported by the Wechsler scales or performance on a neuropsychological battery such as the Reitan Neuropsychological Battery.
- Life events.
- Current stressors.
- Substance use history.
Moreover, it would be possible (and some research supports its utility) to administer this battery adaptively (i.e., tailored to the individual client), reducing the amount of testing time by eliminating redundancy. However, although a fully integrated diagnostic system that incorporates different measures from different domains is conceptually possible, it is not a practical or feasible undertaking for a number of reasons. First, there are issues of copyright with which to contend. Tests are usually owned and controlled by different—often competing— commercial publishers. Obtaining cooperation between such groups to develop an integrated system is unlikely. Second, there is insufficient research information on integrated interpretation with present-day measures to guide their integration into a single report that is internally consistent.
The idea of having the computer substitute for the psychologist’s integrative function has not been widely proclaimed as desirable and in fact has been lobbied against. (Matarazzo, 1986), for example, cautioned that computerized testing must be subjected to careful study in order to preserve the integrity of psychological assessment. Even though decision-making and interpretation procedures may be automated with computerized testing, personal factors must still be considered in some way. Research by Styles (1991) investigated the importance of a trained psychologist during computerized testing with children. Her study of Raven’s Progressive Matrices demonstrated the need for the psychologist to establish and maintain rapport and interest prior to, during, and after testing. These factors were found to have important effects on the reliability and validity of the test data, insofar as they affected test-taking attitudes, comprehension of test instructions, on-task behavior, and demeanor. Carson (1990) has also argued for the importance of a sound clinicianship, both in the development of psychological test systems and in their use.
Tests should not be used for tasks beyond their capability. If a test has not been developed for or validated in a particular setting, computer-based applications of it in that setting are not warranted. Even though computer-based psychological tests have been validated in some settings, it does not guarantee their validity and appropriateness for all settings. In their discussion of the misuse of psychological tests, Wakefield and Underwager (1993) cautioned against the use of computerized test interpretations of the MCMI and MCMI-II, which were designed for clinical populations, in other settings, such as for forensic evaluations. The danger of misusing data applies to all psychological test formats, but the risk seems particularly high when one considers the convenience of computerized outputs—that is (as noted by Garb, 1998), some of the consumers of computer interpretation services are nonpsychologists who are unlikely to be familiar with the validation research on a particular instrument. It is important for scoring and interpretation services to provide computer-based test results only to qualified users.
Research evaluations of computer-based systems have often been slow to appear for some assessment methods. The problems with computer-based assessment research have been widely discussed (Butcher, 1987; Maddux & Johnson, 1998; Moreland, 1985). Moreland (1985), for example, concluded that the existing research on computer-based interpretation has been limited because of several methodological problems, including small sample sizes, inadequate external criterion measures to which one can compare the computer-based statements, lack of information regarding the reports’base-rate accuracy, failure to assess the ratings’ reliability across time or across raters, failure to investigate the internal consistency of the reports’interpretations, and issues pertaining to the report raters (e.g., lack of familiarity with the interpretive system employed), lack of expertise in the area of interest, and possible bias secondary to the theoretical orientation of the rater. D. K. Snyder,Widiger, and Hoover (1990) expressed concerns over computer-based interpretation systems, concluding that the literature lacks rigorously controlled experimental studies that examine methodological issues. They recommended specifically that future studies include representative samples of both computer-based test consumers and test respondents and use characteristics of each as moderator variables in analyzing reports’generalizability.
In fairness to computer-based assessment, there has been more research into validity and accuracy for this approach than there has been for the validity of interpretation by human interpreters—that is, for clinical interpretation strategies. Extensive research on some computer-assisted assessments has shown that automated procedures can provide valid and accurate descriptions and predictions. Research on the accuracy of some computer-based systems (particularly those based on the MMPI and MMPI-2, which have been subjected to more scrutiny) has shown promising results with respect to accuracy. However, reliability and utility of computer-based interpretations vary as a function of the instruments and the settings included, as illustrated by Eyde et al. (1991) in their extensive study of the accuracy of computer-based reports.
Computer-based applications need to be evaluated carefully. Computer system developers have not always been sensitive to the requirement of validation of procedures. It is important for all computer-based systems to be evaluated to the extent that MMPI-based programs have been subjected to such evaluation (Butcher, 1987; Fowler, 1987; Moreland, 1985).
It should be kept in mind that just because a report comes from a computer, it is not necessarily valid. The caution required in assessing the utility of computer-based applications brings about a distinct need for specialized training in their evaluation. It is also apparent that instruction in the use (and avoidance of misuse) of computer-based systems is essential for all professionals who use them (Hofer & Green, 1985). There is also a need for further research focusing on the accuracy of the information contained in computer-based reports.
Offering Psychological Assessment Services Via The Internet
As noted earlier, the expansion of psychological assessment services through the Internet brings to the field special problems that have not been sufficiently dealt with by psychologists. In this section I address several important issues that need to be taken into consideration before making psychological tests available on the Internet.
The question of test security has several facets.
- One must assure that the test items are secure and not made available to the public. Most psychologists are aware that test items are considered protected items and should not be made public to prevent the test from being compromised. Making test items available to the general public would undermine the value of the test for making important decisions. The security of materials placed on the Internet is questionable. There have been numerous situations in which hackers have gotten into highly secure files of banks, the State Department, and so forth. It is important for test security to be assured before items are made available through the Internet.
- Some psychological tests are considered to require higher levels of expertise and training to interpret and are not made available to psychologists without clear qualifications to use them. Many psychological tests—particularly those involved in clinical assessment—require careful evaluation of user qualifications. Wide availability of tests on the Internet could result in access to the test for nonqualified test users.
- Most psychological tests are copyrighted and cannot be copied. Making test items available through the Internet increases the likelihood that copyright infringement will occur.
Of course, there are ways of controlling access to test materials in a manner similar to the way they are controlled in traditional clinical practice—that is, the tests would only be available to practitioners who would administer them in controlled office settings. The item responses could then be sent to the test scoring-interpreting service through the Internet for processing. The results of the testing could then be returned to the practitioner electronically in a coded manner that would not be accessible to nonauthorized persons.
Assurance That the Norms for the Test Are Appropriate for Internet Application
Most psychological tests are normed in a standard manner— that is, by having the normative population taking the test in a standard, carefully monitored test situation. Relatively few traditional psychological tests are administered through the Internet. (One exception to this was the Dutch-language version of the MMPI-2; see Sloore, Derksen, de Mey, & Hellenbosch, 1996.) Consequently, making tests available to clients through the Internet would represent a test administration environment very different from the one for which the test was developed.
Assurance That the Individual Taking the Test Has the Cooperative Response Set Present in the Normative Sample
Response sets for Internet administration versus standard administration have not been widely studied. It would be important to ensure that Internet administration would not produce results different from those of standard administration. As noted earlier, computer-administered versus booklet-administered tests have been widely studied. However, if Internet administration involves procedures that are different from those of typical computer administration, these conditions should also be evaluated.
The Internet Version of the Test Needs to Have Reliability and Validity Demonstrated
It is important to ensure that the scores for the test being administered through the Internet are equivalent to those on which the test was originally developed and that the correlates for the test scores apply equally well for the procedure when the test administration procedures are altered.
Psychological test distributors need to develop procedures to assure that the problems noted here do not occur. As previously noted, it is possible that although the tests are processed through the Internet, they could still be administered and controlled through individual clinicians—that is, it is possible that the problems described here could be resolved by limiting access to the test in much the same way that credit card numbers are currently protected. Practitioners who wish to process their test results through the Internet could administer the test to the client in their office and then enter the client’s responses into the computer from their own facility keyboard or scanner before dialing up the Internet server. In this manner, the clinician (who has been deemed a qualified test user and is eligible to purchase the test) can assume the responsibility for test security as well as determine which psychological tests meet the essential criteria for the test application involved.
The Acceptance and Ethics of Computer- Based Psychological Assessment
When computer-based assessment was in its infancy, there was a concern that ethical problems could result from handing over a professionally sensitive task like personality assessment to computers. Some authorities (e.g., Matarazzo, 1986) expressed concerns that individual clinicians might defer important clinical decisions to computers, thereby ignoring the client in the assessment process. Such reliance upon machines to provide clinical assessments could result in unethical and irresponsible judgments on the part of the practitioner. However, these arguments were answered by Fowler and Butcher (1986), who noted that psychologists use computer-based psychological reports not as a final polished report but as one source of information that is available to the practitioner who is responsible for decisions made about clients. Most authorities in the computerbased area as well as several professional organizations that have provided practical guidelines for computer based assessment, such the Guidelines for Computer-Based Tests and Interpretations of the American Psychological Association (1986) and the Standards for Educational and Psychological Testing by the American Educational Research Association, American Psychological Association, and National Council on Measurement in Education (1999) have supported the ethical use of computer-based psychological assessment.
How do present-day clinicians feel about the ethics of computerized assessment? The earlier concerns over computerbased test usage seem to have waned considerably with the growing familiarity with computerized assessment. For example, in a recent survey concerning computer-based test use (McMinn et al. 1999), most respondents thought that use of computer-based assessment was an ethical practice.
Computer-based psychological assessment has come far since it began to evolve over 40 years ago. As a group, assessment practitioners have accepted computerized testing. Many clinicians use some computer scoring, computer-based interpretation, or both. Most practitioners today consider computer- assisted test interpretation to be an ethical professional activity. Computers have been important to the field of applied psychology almost since they were introduced, and the application of computerized methods has expanded over the past several decades. Since that time, the application of computerized methods has broadened both in scope and in depth.
The merger of computer technology and psychological test interpretation has not, however, been a perfect relationship. Past efforts at computerized assessment have not gone far enough in making optimal use of the flexibility and power of computers for making complex decisions. At present, most interpretive systems largely perform a look up, list out function—a broad range of interpretations is stored in the computer for various test scores and indexes, and the computer simply lists out the stored information for appropriate scale score levels. Computers are not involved as much in decision making.
Computerized applications are limited to some extent by the available psychological expertise and psychotechnology. To date, computer-human interactions are confined to written material. Potentially valuable information, such as critical nonverbal cues (e.g., speech patterns, vocal tone, and facial expressions), is presently not incorporated in computer-based assessments. Furthermore, the response choices are usually provided to the test taker in a fixed format (e.g., true-false).
On the positive side, the earlier suggestion made by some researchers that computer-administered and traditional administration approaches were nonequivalent has not been supported by more recent findings. Research has supported the view that computer-administered tests are essentially equivalent to booklet-administered instruments.
In spite of what have been described as limitations and unfulfilled hopes, computer-based psychological assessment is an enormously successful endeavor. Research thus far appears to point to the conclusion that computer-generated reports should be viewed as valuable adjuncts to clinical judgment rather than as substitutes for skilled clinicians. Computer-based assessment has brought accountability and reliability into the assessment field. It is apparent that whatever else computerized assessment has done for the field of psychology, it clearly has focused attention upon objective and accurate assessment in the fields of clinical evaluation and diagnosis.
- Adams, K. M., & Heaton, R. K. (1985). Automated interpretation of the neuropsychological test data. Journal of Consulting and Clinical Psychology, 53, 790–802.
- Allard, G., Butler, J., Faust, D., & Shea, M. T. (1995). Errors in hand scoring objective personality tests: The case of the Personality Diagnostic Questionnaire—Revised (PDQ-R). Professional Psychology: Research and Practice, 26, 304–308.
- American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
- American Psychological Association. (1986). Guidelines for computer-based tests and interpretations. Washington, DC: American Psychological Association.
- American Psychological Association, Committee on Professional (1984). Casebook for providers of psychological services. American Psychologist, 39, 663–668.
- Baillargeon, J., & Danis, C. (1984). Barnum meets the computer: A critical test. Journal of Personality Assessment, 48, 415–419.
- Blouin, A. G., Perez, E. L., & Blouin, J. H. (1988). Computerized administration of the Diagnostic Interview Schedule. Psychiatry Research, 23, 335–344.
- Butcher, J. N. (1982). User’s guide for the MMPI-2 Minnesota Report: Adult clinical system. Minneapolis, MN: National Computer Systems.
- Butcher, J. N. (Ed). (1987a). Computerized psychological assessment. New York: Basic Books.
- Butcher, J. N. (1987b). The use of computers in psychological assessment: An overview of practices and issues. In J. N. Butcher (Ed.), Computerized psychological assessment: A practitioner’s guide (pp. 3–15). New York: Basic Books.
- Butcher, J. N. (1988). Personality profile of airline pilot applicants. Unpublished manuscript, University of Minnesota, Department of Psychology, MMPI-2 Workshops.
- Butcher, J. N. (1995a). Clinical use of computer-based personality test reports. In J. N. Butcher (Ed.), Clinical personality assessment: Practical approaches (pp. 3–9). New York: Oxford University Press.
- Butcher, J. N. (1995b). User’s guide for the MMPI-2 Minnesota Report: Personnel system. Minneapolis, MN: National Computer Systems.
- Butcher, J. N. (1997). User’s guide for the MMPI-2 Minnesota Report: Forensic system. Minneapolis, MN: National Computer Systems.
- Butcher, J. N., Berah, E., Ellersten, B., Miach, P., Lim, J., Nezami, E., Pancheri, P., Derksen, J., & Almagor, M. (1998). Objective personality assessment: Computer-based MMPI-2 interpretation in international clinical settings. In C. Belar (Ed.), Comprehensive clinical psychology: Sociocultural and individual differences (pp. 277–312). New York: Elsevier.
- Butcher, J. N., Keller, L., & Bacon, S. (1985). Current developments and future directions in computerized personality assessment. Journal of Consulting and Clinical Psychology, 53, 803–815.
- Butcher, J. N., Perry, & Atlis (2000). Validity and utility of computerbased test interpretation. Psychological Assessment, 12, 6–18.
- Caldwell, A. B. (1996). Forensic questions and answers on the MMPI/MMPI-2. Los Angeles, CA: Caldwell Reports.
- Camara, W. J., Nathan, J. S., & Puente, A. E. (2000). Psychological test usage: Implications in professional psychology. Professional Psychology: Research and Practice, 31, 141–154.
- Carr, A. C., Ghosh, A., & Ancill, R. J. (1983). Can a computer take a psychiatric history? Psychological Medicine, 13, 151–158.
- Carretta, T. R. (1989). USAF pilot selection and classification systems. Aviation, Space, and Environmental Medicine, 60, 46–49.
- Carson, R. C. (1990). Assessment: What role the assessor? Journal of Personality, 54, 435–445.
- Choca, J., & Morris, J. (1992). Administering the Category Test by computer: Equivalence of results. The Clinical Neurologist, 6,9–15.
- Davies, M. F. (1997). Private self-consciousness and the acceptance of personality feedback: Confirmatory processing in the evaluation of general vs. specific self-information. Journal of Research in Personality, 31, 78–92.
- Dawes, R. M., Faust, D., & Meehl, P. E. (1989). Clinical versus actuarial judgment. Science, 24, 1668–1674.
- Dickson, D. H., & Kelly, I. W. (1985). The “Barnum effect” in personality assessment: A review of the literature. Psychological Reports, 57, 367–382.
- Downey, R. B., Sinnett, E. R., & Seeberger, W. (1998). The changing face of MMPI practice. Psychological Reports, 83(3, Pt 2), 1267–1272.
- Erdman, H. P., Klein, M. H., & Greist, J. H. (1985). Direct patient computer interviewing. Journal of Consulting and Clinical Psychology, 53, 760–773.
- Erdman, H. P., Klein, M. H., Greist, J. H., Skare, S. S., Husted, J. J., Robins, L. N., Helzer, J. E., Goldring, E., Hamburger, M., & Miller, J. P. (1992). A comparison of two computer-administered versions of the NMIH Diagnostic Interview schedule. Journal of Psychiatric Research, 26, 85–95.
- Exner, J. E. (1987). Computer assistance in Rorschach interpretation. In J. N. Butcher (Ed.), Computerized psychological assessment: A practitioner’s guide (pp. 218–235). New York: Basic Books.
- Eyde, L., Kowal, D. M., & Fishburne, F. J. (1991). The validity of computer-based test interpretations of the MMPI. In T. B. Gutkin & S. L. Wise (Eds.), The computer and the decision-making process (pp. 75–123). Hillsdale, NJ: Erlbaum.
- Farrell, A. D., Camplair, P. S., & McCullough, L. (1987). Identification of target complaints by computer interview: Evaluation of the Computerized Assessment System for Psychotherapy Evaluation and Research. Journal of Consulting and Clinical Psychology, 55, 691–700.
- Finger, M. S., & Ones, D. S. (1999). Psychometric equivalence of the computer and booklet forms of the MMPI: A meta-analysis. Psychological Assessment, 11, 58–66.
- First, M. B. (1994). Computer-assisted assessment of DSM-III-R diagnosis. Psychiatric Annals, 24, 25–29.
- First, M. B., Opler, L. A., Hamilton, R. M., Linder, J., Linfield, L. S., Silver, J. M., Toshav, N. L., Kahn, D., Williams, J. B. W., & Spitzer, R. L. (1993). Evaluation in an inpatient setting of DTREE, a computer-assisted diagnostic assessment procedure. Comprehensive Psychiatry, 34, 171–175.
- Fowler, R. D. (1969). Automated interpretation of personality test data. In J. N. Butcher (Ed.), MMPI: Research developments and clinical applications (pp. 325–342). New York: McGraw-Hill.
- Fowler, R. D. (1987). Developing a computer-based test interpretation system. In J. N. Butcher (Ed.), Computerized psychological assessment (pp. 50–63). New York: Basic Books.
- Fowler, R. D., & Butcher, J. N. (1986). Critique of Matarazzo’s views on computerized testing: All sigma and no meaning. American Psychologist, 41, 94–96.
- Fray, P. J., Robbins, T. W., & Sahakian, B. J. (1996). Neuropsychological applications of CANTAB. International Journal of Geriatric Psychiatry, 11, 329–336.
- French, C. C., & Beaumont, J. G. (1990). A clinical study of the automated assessment of intelligence by the Mill Hill Vocabulary Test and the Standard Progressive Matrices Test. Journal of Clinical Psychology, 46, 129–140.
- Furnham, A. (1989). Personality and the acceptance of diagnostic feedback. Personality and Individual Differences, 10, 1121– 1133.
- Furnham, A., & Schofield, S. (1987). Accepting personality test feedback: A review of the Barnum effect. Current Psychological Research and Reviews, 6, 162–178.
- Garb, H. N. (1998). Studying the clinician: Judgment research and psychological assessment. Washington, DC: American Psychological Association.
- Garb, H. N., & Schramke, C. J. (1996). Judgement research and neuropsychological assessment: A narrative review and metaanalyses. Psychological Bulletin, 120, 140–153.
- Golden, C. J. (1987). Computers in neuropsychology. In J. N. Butcher (Ed.), Computerized psychological assessment: A practitioner’s guide (pp. 344–355). New York: Basic Books.
- Greist, J. H., Klein, M. H., Erdman, H. P., Bires, J. K., Bass, S. M., Machtinger, P. E., & Kresge, D. G. (1987). Comparison of computer- and interviewer-administered versions of the Diagnostic Interview Schedule. Hospital and Community Psychiatry, 38, 1304–1310.
- Grove, W. M., & Meehl, P. E. (1996). Comparative efficiency of information (subjective, impressionistic) and formal (mechanical, algorithmic) prediction procedures: The clinical-statistical controversy. Psychology, Public Policy, and Law, 2, 293–323.
- Grove, W. M., Zald, D. H., Lebow, B., Smith, E., & Nelson, C. (2000). Clinical versus mechanical prediction: A meta-analysis. Psychological Assessment, 12, 19–30.
- Guastello, S. J., Guastello, D., & Craft, L. (1989). Assessment of the Barnum effect in computer-based test interpretations. The Journal of Psychology, 123, 477–484.
- Guastello, S. J., & Rieke, M. L. (1990). The Barnum effect and validity of computer-based test interpretations: The Human Resource Development Report. Psychological Assessment, 2, 186– 190.
- Harris, W. G., Niedner, D., Feldman, C., Fink, A., & Johnson, J. N. (1981). An on-line interpretive Rorschach approach: Using Exner’s comprehensive system. Behavior Research Methods and Instrumentation, 13, 588–591.
- Hile, M. G., & Adkins, R. E. (1997). Do substance abuse and mental health clients prefer automated assessments? Behavior Research Methods, Instruments, and Computers, 29, 146–150.
- Hofer, P. J., & Green, B. F. (1985). The challenge of competence and creativity in computerized psychological testing. Journal of Consulting and Clinical Psychology, 53, 826–838.
- Holden, R. R., & Hickman, D. (1987). Computerized versus standard administration of the Jenkins Activity Survey (Form T). Journal of Human Stress, 13, 175–179.
- Honaker, L. M., Harrell, T. H., & Buffaloe, J. D. (1988). Equivalency of Microtest computer MMPI administration for standard and special scales. Computers in Human Behavior, 4, 323–337.
- Hoover, D. W., & Snyder, D. K. (1991). Validity of the computerized interpretive report for the Marital Satisfaction Inventory. Psychological Assessment: A Journal of Consulting and Clinical Psychology, 3, 213–217.
- Jemelka, R. P., Wiegand, G. A., Walker, E. A., & Trupin, E. W. (1992). Computerized offender assessment: Validation study. Psychological Assessment, 4, 138–144.
- Keller, L. S., & Butcher, J. N. (1991). Use of the MMPI-2 with chronic pain patients. Minneapolis: University of Minnesota Press.
- Knights, R. M., & Watson, P. (1968). The use of computerized test profiles in neuropsychological assessment. Journal of Learning Disabilities, 1, 6–19.
- Labeck, L. J., Johnson, J. H., & Harris, W. G. (1983). Validity of a computerized on-line MMPI interpretive system. Journal of Clinical Psychology, 39, 412–416.
- Lambert, M. E., Andrews, R. H., Rylee, K., & Skinner, J. (1987). Equivalence of computerized and traditional MMPI administration with substance abusers. Computers in Human Behavior, 3, 139–143.
- Layne, C. (1979). The Barnum effect: Rationality versus gullibility? Journal of Consulting and Clinical Psychology, 47, 219–221.
- Lees-Haley, P. R., Williams, C. W., & Brown, R. S. (1993). The Barnum effect and personal injury litigation. American Journal of Forensic Psychology, 11, 21–28.
- Maddux, C. D., & Johnson, L. (1998). Computer assisted assessment. In H. B. Vance (Ed.), Psychological assessment in children (2nd ed., pp. 87–105). New York: Wiley.
- Matarazzo, J. D. (1986). Computerized clinical psychological interpretations: Unvalidated plus all mean and no sigma. American Psychologist, 41, 14–24.
- Mathisen, K. S., Evans, F. J., & Meyers, K. M. (1987). Evaluation of the computerized version of the Diagnostic Interview Schedule. Hospital and Community Psychiatry, 38, 1311–1315.
- McMinn, M. R., Buchanan, T., Ellens, B. M., & Ryan, M. (1999). Technology, professional practice, and ethics: Survey findings and implications. Professional Psychology: Research and Practice, 30, 165–172.
- Meehl, P. E. (1954). Clinical versus statistical prediction: A theoretical analysis and a review of the evidence. Minneapolis: University of Minnesota Press.
- Meehl, P. E. (1956). Wanted—A good cookbook. American Psychologist, 11, 263–272.
- Merten, T., & Ruch, W. (1996). A comparison of computerized and conventional administration of the German versions of the Eysenck Personality Questionnaire and the Carroll Rating Scale for depression. Personality and Individual Differences, 20, 281–291.
- Moreland, K. L. (1985). Validation of computer-based interpretations: Problems and prospects. Journal of Consulting and Clinical Psychology, 53, 816–825.
- Moreland, K. L. (1987). Computerized psychological assessment: What’s available. In J. N. Butcher (Ed.), Computerized psychological assessment: A practitioner’s guide (pp. 64–86). New York: Basic Books.
- Moreland, K. L., & Onstad, J. A. (1987). Validity of Millon’s computerized interpretation system of the MCMI: A controlled study. Journal of Consulting and Clinical Psychology, 55, 113–114.
- Moreland, K. L., & Onstad, J. A. (1989). Yes, our study could have been better: Reply to Cash, Mikulka, and Brown. Journal of Consulting and Clinical Psychology, 57, 313–314.
- O’Dell, J. W. (1972). P. T. Barnum explores the computer. Journal of Consulting and Clinical Psychology, 38, 270–273.
- Pellegrino, J. W., Hunt, E. B., Abate, R., & Farr, S. (1987). A computer-based test battery for the assessment of static and dynamic spatial reasoning abilities. Behavior Research Methods, Instruments, and Computers, 19, 231–236.
- Peters, L., & Andrews, G. (1995). Procedural validity of the computerized version of the Composite International Diagnostic Interview (CIDI-Auto) in the anxiety disorders. Psychological Medicine, 25, 1269–1280.
- Pinsoneault, T. B. (1996). Equivalency of computer-assisted and paper and pencil administered versions of the Minnesota Multiphasic Personality Inventory-2. Computers in Human Behavior, 12, 291–300.
- Prince, R. J., & Guastello, S. J. (1990). The Barnum effect in a computerized Rorschach interpretation system. Journal of Psychology, 124, 217–222.
- Rogers, R., Salekin, R. T., & Sewell, K. W. (1999). Validation of the Millon Clinical Multiaxial Inventory for Axis II disorders: Does it meet Daubert standard? Law and Human Behavior, 23, 425–443.
- Rome, H. P., Swenson, W. M., Mataya, P., McCarthy, C. E., Pearson, J. S., Keating, F. R., & Hathaway, S. R. (1962). Symposium on automation techniques in personality assessment. Proceedings of the Staff Meetings of the Mayo Clinic, 37, 61– 82.
- Roper, B. L., Ben-Porath, Y. S., & Butcher, J. N. (1995). Comparability and validity of computerized adaptive testing with the MMPI-2. Journal of Personality Assessment, 65, 358–371.
- Ross, H. E., Swinson, R., Larkin, E. J., & Doumani, S. (1994). Diagnosing comorbidity in substance abusers: Computer assessment and clinical validation. Journal of Nervous and Mental Disease, 182, 556–563.
- Russell, E. W. (1995). The accuracy of automated and clinical detection of brain damage and lateralization in neuropsychology. Neuropsychology Review, 5, 1–68.
- Schuldberg, D. (1988). The MMPI is less sensitive to the automated testing format than it is to repeated testing: Item and scale effects. Computers in Human Behaviors, 4, 285–298.
- Shores, A., & Carstairs, J. R. (1998). Accuracy of the Minnesota Report in identifying fake-good and fake-bad response sets. Clinical Neuropsychologist, 12, 101–106.
- Sips, H. J. W. A., Catsman-Berrevoets, C. E., van Dongen, H. R., van der Werff, P. J. J., & Brook, L. J. (1994). Measuring righthemisphere dysfunction in children: Validity of two new computerized tests. Developmental Medicine and Child Neurology, 36, 57–63.
- Sloore, H., Derksen, J., de Mey, H., & Hellenbosch, G. (1996). The Flemish/Dutch version of the MMPI-2: Development and adaptation of the inventory for Belgium and the Netherlands. In J. N. Butcher (Ed.), International adaptations of the MMPI-2: Research and clinical applications (pp. 329–349). Minneapolis: University of Minnesota Press.
- Snyder, C. R., & Newburg, C. L. (1981). The Barnum effect in a group setting. Journal of Personality Assessment, 45, 622–629.
- Snyder, D. K., Widiger, T. A., & Hoover, D. W. (1990). Methodological considerations in validating computer-based test interpretations: Controlling for response bias. Psychological Assessment, 2, 470–477.
- Styles, I. (1991). Clinical assessment and computerized testing. International Journal of Man-Machine Studies, 35, 133–150.
- Sukigara, M. (1996). Equivalence between computer and booklet administrations of the new Japanese version of the MMPI. Educational and Psychological Measurement, 56, 570–584.
- Tallent, N. (1958). On individualizing the psychologist’s clinical evaluation. Journal of Clinical Psychology, 14, 243–245.
- Wakefield, H., & Underwager, R. (1993). Misuse of psychological tests in forensic settings: Some horrible examples. American Journal of Forensic Psychology, 11, 55–75.
- Watson, C. G., Juba, M., Anderson, P. E., & Manifold, V. (1990). What does the Keane et al. PTSD scale for the MMPI measure? Journal of Clinical Psychology, 46, 600–606.
- Wilson, F. R., Genco, K. T., & Yager, G. G. (1985). Assessing the equivalence of paper-and-pencil vs. computerized tests: Demonstration of a promising methodology. Computers in Human Behavior, 1, 265–275.
- Wyndowe, J. (1987). The microcomputerized Diagnostic Interview Schedule: Clinical use in an outpatient setting. Canadian Journal of Psychiatry, 32, 93–99.