Content Validity Research Paper

Sample Content Validity Research Paper. Browse other research paper examples and check the list of research paper topics for more inspiration. iResearchNet offers academic assignment help for students all over the world: writing from scratch, editing, proofreading, problem solving, from essays to dissertations, from humanities to STEM. We offer full confidentiality, safe payment, originality, and money-back guarantee. Secure your academic success with our risk-free services.

Validation is a process of disciplined inquiry. In the social sciences, it entails the systematic assembly of logical and empirical evidence to support the intended use of the test scores. Content validation is the process of obtaining such evidence in the form of expert judges’ evaluation of test content in relation to a deﬁned domain of knowledge or universe of behavior. Suppose an examinee takes a test of foreign language proﬁciency, and must translate sentences and identify the meaning of vocabulary words. The test user wants to draw inference from performance on the test items at hand to a broader universe of behavior represented by these tasks. Practical constraints of testing time, examinee fatigue, examinee motivation, and cost of item development limit the number of items on the test. Thus these items and response formats must represent a hypothetical pool of tasks that could have been presented covering the same content. Validity arguments for this inference hinge largely on whether a clear domain of language proﬁciency was deﬁned, and how adequately this domain was sampled by the exercises.

Academic Writing, Editing, Proofreading, And Problem Solving Services

Get 10% OFF with 24START discount code

Using expert judges to review the domain description and evaluate speciﬁc features of the items in relation to this domain, as well as collecting and summarizing their judgments, is known as content validation. Content validation evidence is useful in (a) assessment of educational achievement; (b) assessment of job knowledge or skills by employers; and (c) professional certiﬁcation and licensure testing. It may also play a role in establishing construct validity of instruments for assessing aﬀective traits.

1. History And Terminology

Attempts to deﬁne content validity were prominent in the measurement literature in the mid-twentieth century (Ebel 1956, Gulliksen 1950, Lennon 1956). From 1950s–1980s measurement experts (e.g., Cronbach 1971) identiﬁed three distinct approaches to validation: (a) content validation; (b) criterion-related validation, both predictive and concurrent; and (c) construct validation. Throughout this period, however, a few theorists continued to challenge whether content judgment studies ﬁt within the deﬁnition of validation (e.g., Fitzpatrick 1983, Messick 1975), but this view was not dominant.

Originally, content validation was applied primarily in educational achievement testing. Clearly, the usefulness of test scores as indicators of student learning depended upon the linkage between items presented on the test and the body of knowledge which the items sampled. The 1970s witnessed a growing trend for personnel psychologists to employ content validation to justify use of test results in personnel selection and advancement decisions, despite some notable reservations (Ebel 1977, Fitzpatrick 1983, Guion 1977). This application grew in inverse proportion to waning conﬁdence in the adequacy and legal defensibility of other validation procedures for many jobs and employment settings. Licensing and certiﬁcation testing also relied on content validation throughout this period (Shimburg 1982).

Approaching the twenty-ﬁrst century, a more unitary view of validity prevailed (Messick 1989). One theme of this perspective is that proﬁciency in an academic subject or job performance domain should be viewed as a construct. Although not new, the idea gained mainstream acceptance. From this viewpoint, all validation eﬀorts are designed to provide evidence of construct validity, but there are multiple aspects to the process of construct validation (and multiple types of evidence). Content is consistently the ﬁrst aspect and involves determination of the construct relevance, representativeness, and technical quality of the test using expert judgment (Messick 1995).

Other terms are associated with content validity, but not synonymous. ‘Face validity’ is the extent to which items appear relevant to a trait in the eyes of laymen or examinees who are not necessarily experts in the subject tested (Mosier 1947). ‘Instructional validity’ and ‘curricular validity’ were spawned in the mid-1970s during landmark litigation over the use of high-school competency examinations in Florida (see Yalow and Popham 1983). Curricular validity refers to congruence between item content on an achievement and curricular objectives and textual materials. Instructional validity refers to congruence between item content and instruction from teachers.

2. Design Of A Content Judgment Study

Content judgment studies typically are undertaken by test developers as part of their initial validation eﬀort for a new test of knowledge or proﬁciency. Test users may also conduct local content validation prior to the decision to adopt an existing, published test. These studies may require replication over time to demonstrate that changes in curricula or job requirements have not undermined the meaning of the test scores. A content judgment study involves systematic collection of observations. Steps in the process may include:

(a) Specifying the questions to be addressed.

(b) Deﬁning the domain of interest.

(d) Developing a structured framework for the review process.

(e) Selecting qualiﬁed experts for that domain.

(f) Instructing the experts in the review process.

(g) Collecting and summarizing the data.

2.1 Domain Deﬁnition

With achievement tests, domain deﬁnition typically is extracted from instructional objectives commonly used by those who teach the subjects; review of curricular materials and published textbooks also may play a central role in domain deﬁnition. Interviews or surveys of instructors have also been used in this process (see Yalow and Popham 1983). Test speciﬁcations (blueprints), used by test developers to maintain the desired proportional coverage of various content areas and cognitive performance levels, may also be used in domain deﬁnition. In personnel or licensure testing, domains, or required or prerequisite knowledge may be deﬁned through a detailed job description provided by experts, or a job analysis based on detailed observations of individuals performing the job (Kane 1997); however, in licensure testing the domain is restricted typically to skills and knowledge that are essential to public safety (Smith and Hambleton 1990). A ‘critical incidents’ technique may also be employed to capture important knowledge or skills that spell the diﬀerence between successful and failing job performance. Thorough domain deﬁnition also includes the format(s) in which individuals may be expected to display their knowledge: i.e., it covers both a stimulus and response domain (Guion 1977).

2.2 The Judgment Process

The judgment process focuses upon at least three features of the test items: (a) relevance of item content to the domain; (b) balance of coverage or the extent to which the collection of items adequately sample the content domain; and (c) technical quality of the items, response formats and scoring procedures (Messick 1989, 1995).

Relevance of item content is assessed by evaluating the importance of the knowledge or skills required to answer the test item in relation to successful performance in the domain. Judges may be asked to rate frequency and criticality of the skill or knowledge tapped by the exercise. When the domain is complex, the study may involve having a sample of experts (a) evaluate the appropriateness of the objectives or list of behaviors that comprise the domain; and (b) match exercises to those objectives.

Representativeness (balance of coverage) is ascertained by having the judges review the entire test to identify areas of the domain that may be over- or under-represented. It is possible for all items on a test to be relevant when reviewed individually, but yet for the coverage of the domain to be disproportionately limited to only a few objectives or skills. A mathematics test, for example, might stress computational skills, but place little emphasis on problem-solving, even though both skills are stressed in the curriculum. The proportions of items matched to each objective or job behavior are examined to assess appropriateness of coverage. The format for examinee response to exercises may also be considered in evaluating representativeness. In judging technical quality of exercises, experts attend to features of the question, response format, scoring rubrics or keys, and administration and scoring instructions that might interfere with examinees’ ability to display their knowledge of the correct response.

2.3 Selecting And Training Judges

Judges’ qualiﬁcations and the overall representative nature of the panel of judges in terms of demographic characteristics and educational professional back-grounds are important issues in the design of a credible content validation study. The instructions given to the judges and training they receive before undertaking their review should also be planned and documented. Finally, in large-scale content validations, level of agreement across the panel is useful information.

3. Analyzing Data From Content Judgment Studies

Thorough content validation studies yield substantial amounts of data. Each item should have a record containing multiple judges’ ratings for each feature of the item that was reviewed. Marshalling these data for analyses to summarize results at the individual item level and for the total test requires forethought. At the simplest level, indices such as percentage of items classiﬁed to objectives may be computed and reported. Judges’ mean ratings for each item and over the test as a whole on frequency or criticality may be computed. More sophisticated treatments of data from content validation studies also have been suggested. Crocker et al. (1987) described various quantitative indices used in content judgment studies, including measures of (a) overall ﬁt between test and curriculum; (b) degree of ﬁt for individual items to content domain; (c) ﬁt between test speciﬁcations and examinee performance; and (d) error estimation in judges’ ratings. In addition, new technical approaches to analyzing job analysis data are emerging (e.g., Kane 1997, Sireci and Geisinger 1995).

3.1 Evolving Standards For Content Validation

Standards for evaluating test development and use have been articulated since the 1950s by a series of joint committees of the American Association of Educational Research (AERA), the American Psycho-logical Association (APA), and the National Council on Measurement in Education (NCME). These standards have received global recognition. In the ﬁrst published committee report, each recommendation was labeled as ESSENTIAL, VERY DESIRABLE, or DESIRABLE (APA 1954, p. 20). The primary recommendation for content validation was:

If a test performance is to be interpreted as a sample of performance in some university of situations, the manual should indicate clearly what universe is represented and how adequate the sampling is. ESSENTIAL.

Additional recommendations included that the descriptions be provided for sources of items, criteria for inclusion or exclusion of items, and method of sampling items from the domain. A decade later, the Standards (APA 1966) speciﬁed that content validity information should include:

The universe that the test items represented

(ESSENTIAL).

The adequacy of the procedure for selecting a

sample of items from the universe

(ESSENTIAL).

The dates of publication of any textbooks from

which items topics were selected (ESSENTIAL).

The professional qualiﬁcations of the expert

judges (VERY DESIRABLE).

The directions that they received (VERY

DESIRABLE).

The extent of agreement among judges

(DESIRABLE).

Any classiﬁcation or taxonomy used in item

selection (DESIRABLE).

The next edition of the joint standards (APA 1974) generally upheld the previous standards for content validity but recognized increasing applications of content validation in personnel selection:

When a test is represented as having content validity for a job or class of jobs, the evidence of validity should include a complete description of job duties including relative frequency, importance, and skill level of such duties. ESSEN-TIAL. ( p. 46)

In 1985, the joint Standards downplayed the notion of ‘content validity’ as a separate test quality; but recognized the uniqueness of its form of evidence (AERA et al. 1985). In addition, these standards covered domain deﬁnition and content validation for licensure testing. The ﬁnal edition published in the twentieth century (AERA et al. 1999) includes two standards for validity evidence from content judgment studies. First,

When the validation rests in part on the appropriateness of test content, the procedures followed in specifying and generating test content should be described and justiﬁed in reference to the construct the test is intended to measure or the domain it is intended to represent. If the deﬁnition of the content sampled incorporates criteria such as importance, frequency, or criticality, these criteria should be clearly explained and justiﬁed. ( p. 18)

Second,

When a validation rests in part on the opinions or decision of expert judges, observers, or raters, procedures for selecting such experts and for eliciting judgments or ratings should be fully described. The qualiﬁcations, and experience, of the judges should be presented. The description of the procedures should include any training and instructions provided, should indicate whether participants reached their decisions independently, and should report the level of agreement reached. If participants interacted with one another or exchanged information, the procedures through which they may have inﬂuenced one another should be set forth. ( p. 19)

Shifts in wording and focus of criteria for evaluating content judgment studies reﬂect trends in the measurement community’s view of validity per se. Yet, given advances in the theory, quantitative models, and technology of modern psychometrics, it is remarkable that the underlying logic, methods, and criteria for content judgment studies have survived with so little change. Evidently the concept possesses enduring utility for a broad range of test developers and test users.

Bibliography:

American Educational Research Association (AERA), American Psychological Association (APA) and National Council on Measurement in Education (NCME) 1985 Standards for Educational and Psychological Testing. American Psycho-logical Association, Washington, DC
American Educational Research Association (AERA), American Psychological Association (APA) and National Council on Measurement in Education (NCME) 1999 Standards for Educational and Psychological Testing. American Psycho-logical Association, Washington, DC
American Psychological Association (APA) 1954 Technical recommendations for psychological tests and diagnostic techniques. Psychological Bulletin 5(2), supplement
American Psychological Association (APA) 1966 Standards for Educational and Psychological Tests and Manuals. American Psychological Association, Washington, DC
American Psychological Association (APA), American Educational Research Association (AERA) and National Council on Measurement in Education (NCME) 1974 Standards for Educational and Psychological Tests. American Psychological Association, Washington, DC
Crocker L M, Miller D, Franks E A 1989 Quantitative methods for assessing the ﬁt between test and curriculum. Applied Measurement in Education 2: 179–94
Cronbach L J 1971 Test validation. In: Thorndike R L (ed.) Educational Measurement, 2nd edn. American Council on Education, Washington, DC, pp. 443–507
Ebel R L 1956 Obtaining and reporting evidence for content validity. Educational and Psychological Measurement 16: 269–82
Ebel R L 1977 Comments on some problems of employment testing. Personnel Psychology 30: 55–63
Fitzpatrick A R 1983 The meaning of content validity. Applied Psychological Measurement 7: 3–13
Guion R M 1977 Content validity: The source of my discontent. Applied Psychological Measurement 1: 1–10
Gulliksen H 1950 Intrinsic validity. American Psychologist 5: 511–17
Kane M 1997 Model-based practice analysis and test speciﬁcations. Applied Measurement in Education 10: 5–18
Lennon R T 1956 Assumptions underlying the use of content validity. Educational Psychological Measurement 16: 294–304
Messick S 1975 The standard problem: Meaning and values in measurement and evaluation. American Psychologist 30: 955–66
Messick S 1989 Validity. In: Linn R L (ed.) Educational Measurement, 3rd edn. American Council on Education, Washington, DC, pp. 13–103
Messick S 1994 The interplay of evidence and consequences in the validation of performance assessments. Educational Researcher 23(2): 13–23
Messick S 1995 Standards of validity and the validity of standards in performance assessment. Educational Measurement: Issues and Practice 14(5): 5–8
Mosier C I 1947 A critical examination of the concept of face validity. Educational and Psychological Measurement 7: 191– 206
Shimburg B 1982 Licensing and certiﬁcation. Encyclopedia of Educational Research, 5th edn. American Educational Re-search Association, Washington, DC, pp. 1084–92
Sireci S G, Geisinger K F 1995 Using subject matter experts to assess content representation: A MDS analysis. Applied Psychological Measurement 19: 241–55
Smith I L, Hambleton R K 1990 Content validity studies of licensing examinations. Educational Measurement: Issues & Practice 9(4): 11–14
Yalow E S, Popham W J 1983 Content validity at the crossroads. Educational Researcher 12: 10–14