Internal Validity Research Paper

Internal validity refers generally to the accuracy of inferences about whether one variable causes another. In the context of an experiment, internal validity is concerned with conclusions about whether (and to what degree) the independent variable, as manipulated, makes a difference in the dependent variable, as measured. This research paper addresses the meaning of internal validity, the practice of achieving it, and the challenges of understanding it.

1. Internal Validity Defined

The concept of internal validity apparently originated with Donald T. Campbell and was popularized in his and his collaborators’ work (Campbell 1957, Campbell and Stanley 1966, Cook and Campbell 1979). The term was coined as a counterpoint to ‘external validity,’ which deals with the generaliz-aSbility of a finding to persons, settings, and times other than those examined in the research. Internal validity, in contrast, then, involves the accuracy of a causal inference pertaining to the particular persons, settings, and times examined in the research. As Cook and Campbell (1979) made clear, internal validity is also concerned only with the particular research operations used in a study, that is, with the independent variable as it was manipulated and the dependent variable as it was measured.

The term internal validity originated in the context of research methods designed to probe cause–effect relations, such as randomized experiments and quasi-experiments. This remains the typical usage. However, the term is sometimes applied to procedures that do not investigate causal relations, specifically to measurement instruments such as personality scales. In this alternative usage, the apparent intention is to refer to the structure of the scale and the interrelationship among scale items as ‘internal validity,’ thus differentiating these properties from the scale’s relationship to other measures and behaviors. Such a different and nontraditional use of the term may invite confusion. The remainder of this research paper focuses on internal validity as it applies to cause-probing research.

2. Differentiating Internal Validity From Other Forms Of Validity

Campbell (1986) noted that the term internal validity (even when restricted to cause-probing research) was often used in ways somewhat different from the original intended meaning. He proposed the infelicitous term ‘local molar validity’ as a substitute. This term has not been widely adopted but can help in differentiating internal validity from other forms of validity. Consider as an example a study conducted in a middle school to see whether an anger management program reduces students’ aggressive behaviors such as fighting on the playground. Internal validity is ‘local’ in the sense that it is concerned with the immediate context of the study, that is, the specific children and school that were observed. Attempts to generalize, whether to other middle schools, to high schools, or to adults, instead involve external validity. Internal validity is ‘molar’ in the sense that it is concerned with whole manipulations and with measures, whatever they are, rather than with pure theoretical abstracts. Attempts to draw conclusions about theoretical concepts, say about ‘affective regulation training’ and ‘aggression,’ rather than about the program as implemented and the raters’ observations of the number of fights on the playground, instead involve construct validity.

Some competing validity frameworks define internal validity in different ways. For instance, Cronbach (1982) defined internal validity as involving certain intended generalizations (for a summary and integration of this and other validity frameworks, see Mark 1986). Cronbach’s alternative definition of internal validity includes generalizations to the categories of persons, settings, and times and to the theoretical constructs that were the original, intended targets of the research conclusions. The original concept of internal validity, as developed by Campbell and associates, continues to predominate, however.

The greatest difficulty has been in differentiating internal validity from the construct validity of the cause, that is, the proper labeling of the independent variable in abstract terms. This difficulty is seen most clearly in terms of ‘threats’ to validity (see Sect. 3). Cook and Campbell (1979) presented four threats to internal validity that depend upon comparative processes involving the members of the treatment and control group. For instance, the threat of ‘resentful demoralization’ can occur when one group, say the control group, receives a perceptibly less desirable treatment. If control group members become resentful, this resentment, rather than the intended treatment, may cause differences between the groups. Even Cook and Campbell (1976, 1979, Campbell 1986) have wavered in their judgment about whether these are internal or construct validity threats. More generally, threats to both internal validity and construct validity of the cause involve confounds with the independent variable, suggesting that it may be difficult to differentiate the two types of validity.

One possible resolution is based on the counterfactual conception of cause (Reichardt and Mark 1998). From this perspective, a treatment effect can be defined as the difference between what happens when a (molar) treatment has been administered and what would ha e happened if the (molar) treatment had not been administered but everything else had been the same. The practical problem is that, absent time travel, everything cannot be the same between the treatment and control conditions except for the treatment and its effects. The same people may be compared at different times, or different people may be compared at the same time, but a researcher cannot compare the same people, with and without the treatment, at the same time. But if one allows the fiction of the ideal (but unattainable) counterfactual comparison that only time travel would make possible, the distinction between internal validity and construct validity of the cause can be clarified.

Threats to internal validity would not arise if the ideal comparison could be made. In contrast, a threat to the construct validity of the cause is a mislabeling of the cause that could arise even with the ideal comparison. For example, in studying an anger management program, it would be an internal validity problem if one compared pretest and post-test levels of aggression and if aggression changed simply because the children were older at the post-test. But this internal validity problem would disappear if the ideal counterfactual could actually be obtained, because the comparison would involve the same children at the same age, with and without exposure to the program. In contrast, it would be a construct validity problem if the program’s activities modified children’s affective regulation skills but the program were labeled as a ‘self-efficacy intervention.’ This problem would not be avoided by the ideal counterfactual. Nor would the ideal counterfactual alleviate other construct validity problems, such as various subject and experimenter artifacts. In the case of resentful demoralization, this threat would not occur if the ideal counterfactual were attainable. If a researcher could travel back in time, there would be no need to construct two groups of participants who could be aware of each others’ treatment. Resentful demoralization thus is a threat to internal validity.

3. Threats To Internal Validity

The literature on internal validity consists largely of detailed lists of validity ‘threats.’ Internal validity threats are generic categories of causal forces that may frequently obscure causal inferences. Take as an example, once again, a researcher’s efforts to determine whether an anger management program reduces aggressive behavior in a middle school. ‘History’ refers to the possibility that specific events, other than the intended treatment, may have occurred between the pretest and post-test observations and may obscure the true treatment effect. If the researcher observed the level of aggressive behavior on the playground before the anger management program, and again afterward, history would be a problem if a different, stricter teacher became playground monitor in the interim. ‘Maturation’ refers to the possibility that natural processes which occur over time within the study participants, such as growing older, hungrier, more fatigued, wiser, and the like, may create a false treatment effect or mask a real one. Less aggression may occur at the post-test simply because the children are older than at the pretest, for instance. ‘Attrition’ refers to the possible loss of participants in a study. For example, if children from troubled families are more likely to drop out of school or to move away in the middle of the school year, then attrition could cause a decrease in aggression from the pretest to the post-test. ‘Instrumentation’ arises as a validity threat when a change in a measuring instrument causes erroneous conclusions about the effects of an intervention. For instance, if observers’ standards shifted over time, such that later incidents had to be more violent to be rated as aggressive, this could cause the appearance of a treatment effect when in fact there is none.

‘Selection’ refers to the possibility that post-test differences between a treatment group and a control group may be due to initial differences between the groups rather than to a treatment effect. Selection problems might occur if a researcher attempted to assess the effectiveness of an anger management program by comparing the level of playground aggression in two middle schools, one of which had implemented the program. In addition, more complex internal validity problems can occur, whereby some threat operates only (or more powerfully) in one group than another. For instance, ‘selection by maturation’ indicates that participants in the treatment condition are maturing at a different rate than those in the control condition. See Cook and Campbell (1979) for additional discussion of internal validity threats, including the threats of testing and regression to the mean.

4. Achieving Internal Validity

Most discussions about how to achieve internal validity focus on research design. Randomized experiments are generally recommended, because random assignment eliminates systematic selection bias and allows traditional statistics to estimate and account for purely random selection differences. Randomized experiments also rule out most other internal validity threats, if sound research procedures are used and there is no differential attrition (but see Sect. 5.1). If random assignment is either impractical or unethical, the common recommendation is to enhance internal validity by using a strong quasi-experiment. Quasiexperiments are approximations to experiments but lack random assignment. More generally, the process of ruling out internal validity threats can be seen as a special instance of the logic of pattern matching. In addition, especially in terms of the threats of selection and selection by maturation, the choice of proper statistical analyses can influence internal validity.

5. Common Misconceptions About Internal Validity

Several misconceptions exist regarding internal validity. Three relatively common ones are discussed in this final section.

5.1 Misconception #1: Successful Random Assignment Guarantees Internal Validity

Although random assignment of participants (or other units) to treatment condition can greatly enhance the likelihood of internal validity, problems can still occur. Most widely recognized is that differential attrition may occur, with more (or different kinds of ) participants dropping out of one group than another. As noted earlier, Cook and Campbell also suggested that threats such as resentful demoralization may apply in a randomized experiment. Even if these problems do not occur, internal validity threats can arise in a randomized experiment if proper research procedures are not followed. An experimenter might, for instance, have one rater observe aggression in the treatment group and another rater observe in the control group. This would create an instrumentation threat. As another example, researchers sometimes randomly assign individuals to conditions but then have all members of a group participate together. For example, in a study with mood as the independent variable, following random assignment all members of the positive mood condition may be sent to one room to watch a funny movie, while all members of the control condition see a ‘neutral’ movie in another room. This can allow the independent variable to become confounded with any of a number of other factors, such as the characteristics of the experimenter conducting each session, creating a selection by history threat. To minimize internal validity threats, random assignment must be combined with careful methodology.

5.2 Misperception #2: Presence Of A Validity Threat Internal Invalidity

Despite clear statements by Campbell to the contrary (e.g., Campbell 1969), discussions of internal validity often make it sound as though the theoretical existence of a validity threat necessarily equates with weak internal validity. Some writings seem to suggest, for example, that, because history and several other validity threats can apply to a simple pretest–post-test comparison with one group, the findings from such a study are necessarily invalid. Such thinking is incorrect for at least three reasons. First, a given threat may not operate in a specific case, whether or not it is ruled out by the design. Maturation does not always cause changes in every pretest–post-test study, for instance. Second, conclusions about the causal impact of a treatment on some outcome may be accurate, even when a validity threat is operating, if the threat’s effect is too small to invalidate the conclusion drawn. In a pretest–post-test design, for example, maturation may occur but be small enough not to obscure a reasonably accurate conclusion about the treatment effect. Third, even if a threat is not trivial in size, in some cases it may be possible to estimate the magnitude of the threat and adjust for it. This is the basic logic of efforts, for instance, to model selection bias (see Reichardt 2000 for an elaboration of this logic). Still, assessing the plausibility and magnitude of validity in a nonexperimental context remains an imprecise art. The development of empirically supported theories of the conditions under which various validity threats operate, with what magnitude, would be an important future development.

5.3 Misconception #3: An Emphasis On Internal Validity = A ‘Black Box’ Experiment

Some critics contend that giving priority to internal validity implies a disinterest in mediating processes. In applied social research, for example, the claim has been made that those who give priority to internal validity commonly estimate the effect of an intervention without trying to peer into the ‘black box’ and learn about underlying processes. In fact, the methods associated with strong internal validity, such as the randomized experiment, do enable researchers to conduct black box experiments. However, they do not require it. Experiments (and other methods used in the service of internal validity) can be integrated with other methods used to study mediational models). Moreover, methods that maximize internal validity are also widely used in investigations designed specifically to test hypotheses about underlying causal mechanisms.


