External Validity Research Paper

Academic Writing Service

Sample External Validity Research Paper. Browse other research paper examples and check the list of research paper topics for more inspiration. If you need a research paper written according to all the academic standards, you can always turn to our experienced writers for help. This is how your paper can get an A! Feel free to contact our custom research paper writing service for professional assistance. We offer high-quality assignments for reasonable rates.

1. Definition And Challenges

External validity (along with statistical conclusion validity, internal validity, and construct validity) provides a framework to conduct and interpret empirical studies of causal relationships. External validity assesses ‘to what populations, settings, treatment variables, and measurement variables can [the effect] be generalized?’ (Campbell and Stanley 1966, p. 5). By definition, external validity is about rules for inference; they decrease uncertainty that a cause–effect relationship holds under particular conditions. External validity is not defined by the extent to which a cause–effect relationship generalizes across conditions, any more than internal validity is defined by low p values or large effect sizes. After all, when we know that a causal relationship does not generalize, the information is useful to refine theory or to prevent the useless expenditure of resources, just as internally-valid information on absence of an effect is useful for these reasons. Sometimes researchers apply the term ‘external validity’ to correlational studies, for example, validating a measure by applying it to populations other than those originally studied. Although this more general usage of external validity is consistent with recent reformulations (e.g., Cronbach 1982), this research paper will deal only with the external validity of causal relationships. Historically, researchers have faced three challenges in assessing external validity, which new developments have helped to address.

Academic Writing, Editing, Proofreading, And Problem Solving Services

Get 10% OFF with 24START discount code


1.1 The Challenge Of Inductive Inference

External validity requires inductive inference, relying on extrapolation beyond the conditions and findings of a single study. Induction is never fully justified on logical grounds (Campbell and Stanley 1966). By contrast, internal validity is established through a more deductive framework whereby causal relationships can be both corroborated and falsified within a single study. Both internal and external validity depend on information that originates outside the single study, especially regarding judgements about what is already known. However, external validity is considerably more dependent on information from outside the study, and this poses a challenge: can external validity rise above the level of the educated guess?

1.2 Perceived Tradeoffs Between Internal And External Validity

Internal validity is compromised if research does not involve controls and the measurement of outcomes. But external validity can often be compromised by the very presence of design controls, by obtrusive measures, and by restricting the study to contexts where respondents are willing to receive any treatment to which they might be assigned (e.g., Barker 1968, Brunswik 1955). Researchers who are concerned with the artificiality of controlled studies have, over time, divided into those who set less priority on the study of causes, in general, and those who would maximize external validity by studying causal relationships in field settings. Some researchers put a higher priority on internal validity, arguing that there is little point in generalizing a causal conclusion about which there is legitimate doubt; moreover, even in field research the study context loses verisimilitude whenever controls are needed to help assess cause (Campbell and Stanley 1966, Cook and Campbell 1979). Others maintain that external validity deserves priority because the research should be optimally relevant to practice (Cronbach 1982). The challenge then arises: what is the best balance in a given study? Cannot both internal and external validity be optimized somehow?




1.3 The Challenge Of Complex Interactions

Causal relationships in real-world settings are complex, and statistical interactions of variables are assumed to be pervasive (e.g., Brunswik 1955, Cronbach 1982). This means that the strength of a causal relationship is assumed to vary with the population, setting, or time represented within any given study, and with the researcher’s choices about treatments and measurement of outcomes. Without a sensitive assessment of such interactions, true effects can be obscured or causal claims can be overgeneralized to a wider range of people, settings, times, treatments, or outcome constructs than is warranted. Unfortunately, the number of possible interactions is endless, posing problems to the analyst that are insuperable, at least in theory (Cook 1993). The challenge is: how do we cope with this complexity?

The practical task is to assess the most plausible interactions within a given research area. Interactions occur often between treatment, populations, measures and context, posing the most plausible threats to external validity (Campbell and Stanley 1966, Cook and Campbell 1979). Because external validity specifies the conditions under which an internally valid relationship can be reproduced, threats to external validity necessarily invoke internal validity.

The interaction of testing and treatment suggests that an effect size varies by the conditions of measurement.

The interaction of selection and treatment hypothesizes that an effect size varies by population studied.

The interaction of setting and treatment describes effects that vary by setting.

The interaction of history and treatment deals with the extent to which a causal relationship replicates across different times.

Other interactions between features of the study and treatment sometimes occur, but their plausibility depends on the individual study context and on what is generally known about these study features.

2. External Validity Involves Inferences From Samples

2.1 What Do We Generalize About?

The statistician would say that researchers generalize about a sample they have observed of a population, usually of persons or other units being studied. However, external validity treats inferences about treatments, measures, and settings just as seriously as inferences about human population groups. It therefore assumes that studies have to sample from these domains, as well as from human populations (Cook 1993). A given study therefore represents not only a sample from a population of units, but also a sample from a universe of potential settings, a universe of potential treatments, and a universe of potential measures. The sampling frame becomes explicit whenever an investigator makes choices about whom to study, what treatment to administer, or which measures upon which to base inferences. Beyond the individual study, bodies of research (literature reviews, meta-analyses, and programs of related studies) are now understood to be samples from the universe of potential studies. The available sample of findings both limits and guides inferences about external validity.

The sampling frame for external validity considers not only categories, but constructs (Cronbach 1982). Construct validity is therefore intimately tied to external validity. This becomes clear when we note that a population of persons is akin to a category of persons and that specifying the treatment variables involved in a causal relationship requires identifying the entities, constructs, classes or categories to which generalization can be justified (Cook and Campbell 1979).

Methodologists have adopted the following notation from Cronbach (1982). Upper case represents the universe or domain, while lower case notation represents an element in a specific study.

U represents the population of units (persons, sites, communities, organizations, states, nations, or other settings and aggregates) about which a conclusion is sought. In contrast, u refers to the sample of units in a study.

T represents the plan and the set of realizations for a treatment, experimental manipulation, program, or other intervention, while t is the single realization of T for a given u. A naturally-occurring event can also be denoted t, such as an earthquake or economic recession, and T the universe of similar events.

O represents the admissible procedures and conditions for obtaining data (and the data themselves). In causal research, it is often used to refer to the possible effect, thus using O to represent the outcome construct and o the actual measure of effect.

S represents the context or setting, including the culture and times in which the study takes place. Some researchers use s to denote an individual study’s setting and large S to denote the accrual of such settings (Campbell and Stanley 1966). Others maintain that S is always confounded with U and T, and cannot be deliberately sampled in the same way as U, T, and O.

UTOS is the domain about which a research question is asked. Once U, T, O, and S are specified concretely, that constitutes a description of the casual question being researched. On the other hand, an individual utoS is the combination of u, t, and o that constitutes an instance on which data are collected. Thus, it is the operational or sampling level of a study; it is how the researcher proposes to answer the question about T as a cause, O as an effect, U as a population of persons, and S as the cultural and historical context in which the hypothesis is tested.

Within UTOS, replications are of three kinds: (a) exact replication using fresh samples of U, T, and O; (b) studies by other investigators trying to repeat the UTO procedures according to the best description available; and (c) studies by other investigators who choose different procedures to represent the same U, T, and O based on their own judgement about what constitutes instances of each. In all three cases, the interest lies in ascertaining whether a class of causal relationships was replicated. For example, the policy maker may want to know whether a treatment for injection drug users (IDUs) in New York City will produce a reduction in behaviors that lead to HIV transmission. Replication of the treatment across fresh samples of IDUs in New York and other northeastern cities increases confidence about treatment effectiveness.

At the level of UTO, the instances are joint distributions of units, treatments, and observations. However, some combinations are more common than others. For instance, in laboratory experiments in social psychology certain classes of treatment studied there (e.g., about factors affecting group performance) are confounded with a population almost exclusively composed of college sophomores. By studying categories within these joint distributions, called sub-UTOS, we may ask whether a causal relationship holds with different categories within UTOS. These lower-order categories often include characteristics of the populations, variations in the content or level of a treatment or dimensions of some set of measured responses. For example, one may study attitude changes in whites and blacks, or in cases that experience a treatment at high and moderate levels of intensity and duration. Sub-UTOS provide a key to external validity because patterns of variation within a study allow us to ascertain some of the conditions under which a causal relationship holds.

2.2 What Do We Generalize To?

Cronbach (1982) distinguished two basic types of external validity inferences: those concerning reproducibility of findings in UTOS, and those that go beyond UTOS to new domains termed *UTOS. The asterisk indicates that causal relationships in these domains have not yet been studied, or have not been studied jointly.

Inferences about UTOS are both retrospective and prospective, because they rely on an existing corpus of replications and a definition of constructs that incorporates the existing range of replications. The studies are already in existence. Generalization from utoS to UTOS is not entirely retrospective, since inferences sometimes need to be made about levels of variables that were not directly observed, for example, when fourth-grade and sixth-grade children are studied, but not those in fifth grade, or when four hours of therapy are given vs. eight hours. Moreover, a critical understanding of variation in UTOS always depends on construct validity and theory-building, which usually go beyond the observed utoS. Nevertheless, at some point causal hypotheses undergo a test that then extends external validity to new variations within UTOS, or expands UTOS.

By comparison, inferences about *UTOS are much more heavily prospective and even speculative, because by definition causal relationships in *UTOS have not been studied directly. This poses a problem because new knowledge often stimulates a desire to draw conclusions about causal relationships outside UTOS, whether the finding involves an important relationship in basic social science, or the effect of a social program or policy. Take the example of HIV prevention: results may impress policy makers who are looking for ways to combat HIV AIDS in other populations. Will the treatment for IDUs employed in the 1980s, when modified for use (*T ) with women at risk in family planning clinics (*U ), in public housing communities of the twenty-first century (*S ), also lead to reductions in the (very different) behaviors that put them at risk of HIV transmission (*O)? This requires asking whether the cause–effect relationship generalizes to *UTOS. The answer is not definitive until cause-probing studies of *UTOS are conducted.

2.3 Generalization As Decision Making Under Uncertainty

There is always some uncertainty about causation in the social sciences, since plausible alternative explanations may be discovered for even the best replicated cause-effect relationship. However, uncertainty increases about the conditions under which causal relationships hold, as investigators move from an observed set of utoS, to unobserved variations within a well-defined UTOS, to *UTOS (Cronbach

1982). Bayesian statisticians offer a useful way to cope with uncertainty about generalization: first incorporate information to assess the prior likelihood that a finding will generalize, assess the degree of uncertainty about the prior likelihood, and assess the likely consequences of a decision about generalization that is correct or incorrect (Cronbach 1982). Where the consequences are important (expenditure of public resources, or a new direction for theory), new information is needed. After a decision is taken, some additional information will become available: either a test of causal relationships or descriptive and experiential information that may help to further reduce uncertainty.

Generalization is more than educated guessing when it is systematic guessing. The next section outlines principles that reduce uncertainty in generalizing about UTOS, by planning studies or using the variation in existing studies; and in generalizing to *UTOS, through noncausal empirical studies, expert judgement, and theory-building.

3. Studies To Reduce Uncertainty About UTOS And *UTOS

Inferences about external validity can proceed only so far in the context of a single study. Many of the problems can be addressed more adequately through literature reviews and programs of studies, whether planned or unplanned. Multiple studies allow us to improve the quality of inductive inference about external validity. They also permit us to recognize potential tradeoffs between internal and external validity and to optimize both (where feasible). Finally, they improve our ability to disentangle the web of interactions among variables and so identify more of the conditions under which a causal relationship varies in magnitude and even sign.

3.1 Practical Limitations On Representative Statistical Sampling in utoS

Representative statistical sampling of utoS would be ideal, but it is not realistic for most causal studies in the social sciences. These operate under conditions that prohibit a sampling frame that could represent the universe of all possible versions of a cause (T ) and effect (O). Concerning S, we cannot sample historical occasions at random in order to probe the temporal generality of a causal proposition, and expense usually prohibits choosing a random sample of all relevant social situations in order to implement a treatment therein. And to the degree that voluntary consent is required, we are prohibited from studying certain u in certain ways.

4. Alternative Methods And Principles

A pragmatic alternative approach is to use the less sophisticated methods and principles typically used for generalizing from operational representations of a cause or effect construct. Social scientists routinely generalize from t to T and from o to O without random sampling, and they seem to feel very confident about the inferences they draw because the constructs make logical sense. Can the principles and methods they use also apply to generalizing from u to U and from s to S, and even to *UTOS (Cook 1993)?

4.1 The Principle Of Proximal Similarity

Proximal similarity (Campbell 1988) refers to instances or samples of people, treatments, measures, and settings that look like the populations or categories to which one would like to generalize. A limited set of matching variables is needed that, on the basis of prior theory and other forms of consensus in the social science community, identify the most prototypical attributes of the entity to which generalization is sought. This theory of generalization depends on: (a) superficial correspondences rather than formal sampling, (b) the ability to specify prototypical attributes of a given UTOS (Berkowitz and Donnerstein 1982), and (c) on the ability to purposively select research operations that represent the targets of inference at least on the sampled, prototypical variables and any correlates thereof (Cook 1993). Such matching is central to the process researchers now routinely use in research planning when they want to select, not just cause and effect operations, but also measures of control, moderating and mediating variables.

For generalization in UTOS, prototypical attributes need to be included in the existing or planned cause probing studies (Cook 1993). For *UTOS, the research community uses prior information and best judgement to determine the plausibility or implausibility of generalization (Cronbach 1982). Qualitative and quantitative descriptive studies about *U may become available. For example, focus groups may tell us how Hispanics experience heart attack symptoms; a *T is identified, similar to T from existing studies, that is likely to increase Hispanics’ recognition of symptoms (Leviton et al. 1999).

In popular language, we might exemplify the principle of proximal similarity by the following: if you want to generalize about ducks and cannot randomly choose instances of ducks from the class of ducks, then you should at least select instances for study that look like our culturally shared image of ducks, that quack like ducks and that waddle like ducks. But so defined, ‘ducks’ could still be confused with small geese or juvenile swans. To deal with this we need either a theory of classification based on more valid characteristics that are not easily observable (e.g., DNA) or we need additional principles.

4.2 The Principle Of Discriminant Validity

The constructs in UTOS will be well-defined when cognate concepts can be identified and differentiated from them. So, one has to be explicit about how a duck differs from a goose (or liking from love, or how one school reform program differs from another). For generalization about UTOS one has to demonstrate empirically that the causal proposition one wants to defend holds for ducks when they are differentiated (discriminated) from swans. Both theory and measurement are needed to support the discrimination required.

To differentiate what can be generalized in *UTOS, boundary variables are helpful (Fromkin and Streufert 1976), defined as ‘critical differences between the research setting and criterion setting which could alter the existence or magnitude of the relationship between two or more variables’ (Bernardin and Villanova 1986, p. 43). For example, cause-probing studies of performance appraisal are usually conducted in laboratories, which are somewhat artificial compared to the business environment. To explore external validity, Bernardin and Villanova (1986) surveyed 125 personnel administrators about the most common characteristics of their performance appraisal systems, supplementing this information with other surveys and the available literature. Some laboratory studies were ruled out as too dissimilar to these modal instances, while others were tentatively ruled in.

4.3 The Principle Of Heterogeneous Irrelevancies

Across multiple measures, sites, or studies, much irrelevant variation can occur in UTOS. Thus, a treatment might be implemented in different ways; an outcome might be conceptualized and measured in different ways; the samples of respondents might differ in age, gender, or racial composition even though they all belong in the category to which one wants to generalize. This heterogeneity is usually considered an impediment, and it can certainly reduce statistical power. However, it also provides the opportunity to test whether a causal relationship holds across the irrelevancies actually sampled (Cook 1993, Kruglanski and Kroy 1976). It is therefore a test of whether the relationship is robust despite irrelevancy in how measurement occurs, a treatment is operationalized, a sample is drawn and settings are selected.

A deliberate strong test of external validity will introduce as much irrelevant variation in UTOS as possible. In this sense, the extent of generalization does, in fact, assist our inferences about external validity. For example, we know that various populations at risk of HIV AIDS (U ) respond well to messages delivered by people similar to themselves (T ), although the venues vary tremendously (street, classroom, gay bar, homeless shelter), and the specific risk behaviors (O) vary as well. The risk behaviors form a larger class, as do the populations and treatments, and the findings replicate across cultures and decades (S ). For generalizing to *UTOS, this robust type of finding is also helpful. Given the consistent findings, what is the prior likelihood that a new group at risk of HIV AIDS (*U ) would also change risk behaviors when exposed to an exemplar of T ? It is certainly not zero.

4.4 The Principle Of Causal Explanation

One of the major goals of science is to explain phenomena, including causal relationships. Identifying the how or why of a relationship can help those with the requisite knowledge to recreate whatever causal forces originally created the effect in circumstances that are quite different. For instance, knowing why TV violence causes violent behavior in children might help modify how violence is portrayed in the future. This is why causal explanation is held to be the major mechanism for moving from utos to *UTOS, for generalizing to as yet unstudied circumstances (Cronbach 1982). Thus, we judge social and behavioral theories by their capacity to specify constructs that moderate or mediate a causal relationship. However, it is rare to find studies that have manipulated all of these causal-explanatory contenders or even a goodly portion of them. Some are usually measured, though, and so it becomes important to use measurement and data analysis to establish which components are most likely to be causally responsible for an effect and so are essential to any practical generalizations that will be made from the data.

4.5 The Principle Of Interpolation And Extrapolation

Few social or behavioral theories succeed in specifying the levels of a treatment that will usually bring about a desired effect. Parametric studies that vary many levels on the independent variable and then examine response curves are rare: instead, just two or three levels are manipulated, leaving us unclear about how the outcome variable would have behaved at points between the levels measured. In principle, such interpolation is easy. Dose-response studies, or other plans to vary the level of T, provide a means to assess this issue, as does meta-analysis when levels of T vary across studies. Moreover, if a similar effect size is obtained across two levels, it is often reasonable to assume it will also hold at non-sampled points between them. For generalizing about UTOS, it is desirable to sample a wide range of treatment levels and as many points in-between as are feasible. In the same way, we might sample a wide range of persons (e.g., on age or educational attainment), time periods (e.g., years after discovery of the AIDS epidemic), and levels of outcome variables that might otherwise be truncated.

Extrapolation of effects that are beyond the ranges available in studies is more problematic, because the shape of the curve or the nature of the effect itself may change at more extreme values on the independent variable. Thus, water turns to ice at zero degrees centigrade and to vapor at 100 degrees. Between these points, it is water. But few social science relationships are understood this well. We simply do not know where qualitative transformations come about in many social relations (for instance, how much life stress is needed to create clinically significant depression). Extrapolation to points closer at hand (UTOS ) is generally less uncertain than extrapolation to more distant points (*UTOS ).

Bibliography:

  1. Barker R G 1968 Ecological Psychology: Concepts and Methods for Studying the En ironment of Human Behavior. Stanford University Press, Stanford, CA
  2. Berkowitz L, Donnerstein E 1982 External validity is more than skin deep. American Psychologist 37: 245–57
  3. Bernardin H J, Villanova P 1986 Performance appraisal. In: Locke E A (ed.) Generalizing from Laboratory to Field Settings. Lexington Books, Lexington, MA
  4. Brunswik E 1955 Representative design and probabilistic theory in a functional psychology. Psychological Review 62: 193–217
  5. Campbell D T 1988 Methodology and Epistemology for Social Science: Selected Papers. University of Chicago Press, Chicago
  6. Campbell D T, Stanley J C 1966 Experimental and Quasi- experimental Designs for Research. Rand McNally, Chicago
  7. Cook T D 1993 A quasi-sampling theory of the generalization of causal relationships. In: Sechrest L B, Scott A G (eds.) Understanding Causes and Generalizing About Them. New Directions for Program Evaluation 57: 39–82
  8. Cook T D, Campbell D T 1979 Quasi-experimentation. Rand McNally, Chicago
  9. Cronbach L J 1982 Designing Evaluations of Educational and Social Programs. Jossey-Bass, San Francisco
  10. Fromkin H L, Streufert S 1976 Laboratory experimentation. In: Dunnette M D (ed.) Handbook of Industrial and/organizational Psychology. Rand McNally, Chicago, pp. 415–65
  11. Kruglanski A W, Kroy M 1976 Outcome validity in experimental research: A reconceptualization. Representative Research in Social Psychology 7: 166–76
  12. Leviton L C, Finnegan J R, Zapka J G 1999 Formative research methods to understand patient and provider responses to heart attack symptoms. Evaluation and Program Planning 22: 385–98

 

Factor Analysis And Latent Structure Research Paper
Univariate Methods of Exploratory Data Analysis Research Paper

ORDER HIGH QUALITY CUSTOM PAPER


Always on-time

Plagiarism-Free

100% Confidentiality
Special offer! Get 10% off with the 24START discount code!