Sample Nonequivalent Group Designs Research Paper. Browse other research paper examples and check the list of research paper topics for more inspiration. If you need a religion research paper written according to all the academic standards, you can always turn to our experienced writers for help. This is how your paper can get an A! Feel free to contact our research paper writing service for professional assistance. We offer high-quality assignments for reasonable rates.

In nonequivalent group designs, comparisons are drawn between two or more groups of study participants with each group receiving a different treatment. The purpose, as in all quasi-experiments, is to assess the relative effectiveness of the different treatments. What distinguishes a nonequivalent group design from other designs is the manner in which the study participants are assigned to the different treatments. In a randomized experiment, participants are assigned to treatments atrandom. In a regression-discontinuity design, participants are assigned to treatments based on their value on an ordered quantitative variable. In a nonequivalent group design, participants are assigned to treatments in a nonrandom, nonquantitatively ordered manner. For example, a nonequivalent group design arises when participants self-select themselves into the different treatments, when an administrator assigns participants to different treatments, or when different treatments are implemented in different sites. In analyzing data from nonequivalent group designs, researchers must be concerned with biases that can arise because of the nonrandom assignment. The discussion describes the most widely used statistical analyzes and design features for addressing these biases.

## Academic Writing, Editing, Proofreading, And Problem Solving Services

#### Get 10% OFF with 24START discount code

## 1. Threats To Internal Validity

Because the participants are not assigned to treatments at random, the treatment groups in a nonequivalent group design will differ, to one degree or another, even before the treatments are administered. Such differences among the treatment groups are called selection differences. In the nonequivalent group design, selection differences are a threat to internal validity because they can bias the estimate of the treatment effect, that is, selection differences can cause post-treatment differences between the treatment groups, even if there are no treatment effects.

In addition to selection differences, other threats to internal validity can also arise in nonequivalent group designs. For example, Cook and Campbell (1979) list several additional threats to internal validity, including compensatory equalization of treatments, compensatory rivalry, resentful demoralization, diffusion or imitation of the treatment, and local history. Threats to validity that involve recognition that different participants receive different treatments, such as resentful demoralization, are likely to be more severe in randomized experiments than in nonequivalent group designs (Fetterman 1982, Lam et al. 1994). On the other hand, threats such as local history will often be more severe in nonequivalent group designs than in randomized experiments. Although a variety of threats to internal validity can arise, the remainder of the present discussion will be concerned only with taking account of selection differences because it is that threat that most distinguishes nonequivalent group comparisons from other types of comparisons.

## 2. The Prototypic Nonequivalent Group Design

In the prototypic nonequivalent group design, only two treatments are compared: an experimental and a comparison treatment. In addition, the participants in each of these two treatment conditions are assessed at exactly two points in time, once before and once after the treatments are implemented. Differences between the two treatment groups on the post-treatment measure are then used to estimate the effects of the treatments and the pretreatment measure is used to take account of the effects of selection differences. If there were no pretreatment measure, the results would usually be uninterpretable because there would be no way to take account of the effects of selection differences.

Various statistical procedures have been proposed for taking account of the effects of selection differences, with the assistance of the pretreatment scores. The four most commonly used methods are described below. In general, all four of these procedures are most credible when the pretreatment measure is operationally identical to the post-treatment measure.

Each of the four analysis strategies imposes different assumptions about the nature of the selection differences. If the assumptions of a given procedure are correct, the resulting estimates of the treatment effects will be unbiased by selection differences. However, if the assumptions are incorrect the estimates of treatment effects are likely to remain biased, perhaps severely so. Therefore, researchers must be diligent in choosing an analysis strategy whose underlying assumptions ﬁt the circumstances at hand. Unfortunately, researchers will seldom, if ever, have enough information to know which set of assumptions is correct. As a consequence, researchers are advised to conduct multiple analyzes that impose a range of plausible assumptions about the nature of selection differences. Even then, researchers will typically need to be cautious in interpreting the results from prototypic nonequivalent group designs.

### 2.1 Change-Score Analysis

In change-score analysis, the estimate of the treatment effect is the mean difference between the experimental and comparison groups in the amount of change that takes place from the pretreatment measure to the posttreatment measure. For this change from pretreatment to post-treatment to be meaningful, the pretreatment and post-treatment measures must be operationally identical. A change-score analysis can be implemented by calculating a pretreatment to post-treatment difference (i.e., a change score) for each individual and then performing a test of the mean difference between the treatment groups on the change scores. The same result can also be obtained using the pretreatment and post-treatment scores in a repeated-measures analysis of variance.

The underlying assumption in the change-score analysis is that the mean change from pretreatment to post-treatment will be the same in the two treatment groups, in the absence of a treatment effect. This assumption may or may not be correct. Sometimes treatment groups change at different rates so the gap between them either increases or decreases over time, even in the absence of a treatment effect. Such an outcome is called a selection-by-maturation interaction. For example, gaps in income often increase over time because the rich tend to get relatively richer. Patterns such as this can be accommodated in some cases by transforming the pretreatment and posttreatment measures to have equal variances (Judd and Kenny 1981). In other circumstances, treatment groups change so that the gap between them decreases over time. For example, the gap would tend to decrease if individuals in one or both groups were selected for their extreme scores on the pretreatment measure and subsequent performances regressed back toward the mean. This is called a regression artifact (Campbell and Kenny 1999).

Whether alternative explanations can account for the observed results sometimes depends on the pattern of the outcomes (Cook and Campbell 1979). For example, when the experimental group starts out below the comparison group but ends up higher, the pattern of outcomes is called a crossover interaction. Such a pattern seldom can be plausibly explained as due to either a selection-by-maturation interaction or regression toward the mean. However, patterns that can be interpreted credibly are not always easy to produce in practice.

### 2.2 Matching Or Blocking

In matching, each participant from the comparison group is paired with a participant from the experimental group so that they have the same or similar pretreatment scores. Participants who cannot be matched are discarded from the analysis. The effect of the treatment is then estimated by comparing the posttreatment scores in each matched pair (e.g., via a matched-pair t test). In blocking, the participants are matched in groups (or blocks) based on their pretreatment scores, and then comparisons are drawn between the post-treatment scores within each of the blocks (e.g., via a ‘randomized’ blocks analysis of variance).

The degree to which matching or blocking removes bias due to selection differences depends on several factors. First, bias reduction depends on how similar the participants are on the pretreatment measures within each pair or block (Cochran and Rubin 1973). As the number of pretreatment measures increases, usually it is harder to form pairs or blocks in which the participants are similar on all the measures. This problem can be addressed by matching or blocking participants on the basis of propensity scores, which are single-variable aggregates of multiple pretreatment measures (Rosenbaum 1995).

Second, bias will remain to the degree the pretreatment scores contain measurement error. Under some assumptions, measurement error can be taken into account by matching or blocking participants using pretreatment scores that are corrected for their degree of unreliability.

Third, bias reduction depends on the degree to which the pretreatment measures tap all the factors that inﬂuence the post-treatment scores and on which the treatment groups differ. For example, to the extent (a) the participants within each pair or block are not matched on, say, motivation, (b) the treatment groups differ in motivation, and (c) motivation inﬂuences the post-treatment scores, then matching or blocking will fail to control for the effects of all selection differences. Unfortunately, there is seldom any way to be sure that all the relevant selection differences have been assessed and included in the analysis. Often the best, though still fallible, way to assure that most of the relevant factors are taken into account is to use pretreatment measures that are operationally identical to the post-treatment measures.

### 2.3 Analysis Of Covariance

Analysis of covariance (ANCOVA) is analogous to matching and blocking. The difference is that ANCOVA matches participants mathematically rather than by physically forming pairs or blocks of similar participants (Reichardt 1979). By matching mathematically, ANCOVA avoids the problem of inexact matches that can arise when using pairs or blocks, but does so at the expense of having to impose assumptions about the shape of the regression surface between the pretreatment and post-treatment scores.

An analysis of covariance is accomplished by regressing the post-treatment scores on to both pretreatment measures and a dummy variable that indicates membership in the different treatment groups. The estimate of the treatment effect is the regression coefficient for the group-membership dummy variable. Although not as problematic for ANCOVA as for matching or blocking, having a large number of pretreatment measures can make the analysis difficult to implement, and this difficulty might again be reduced by using propensity scores.

Like matching or blocking, ANCOVA is susceptible to biases due to measurement error in the pretreatment scores. Analytic adjustments can be made in ANCOVA to address unreliability in the pretreatment scores (Sorbom 1978). ANCOVA is also just as susceptible as matching or blocking to biases due to omitting pretreatment measures that inﬂuence the post-treatment scores and on which the treatment groups differ. Unless the pretreatment measures capture all of the selection differences that inﬂuence outcomes, the results of ANCOVA, like matching or blocking, are likely to remain biased.

### 2.4 Selection Modeling

In selection modeling, the post-treatment scores are regressed on to both pretreatment measures and a ‘selection’ variable that models the assignment of participants into the treatment groups. The estimate of the treatment effect is the regression coefficient for the selection variable in this regression equation (which will be called the ‘outcome’ regression equation).

The selection variable that appears in the outcome regression equation can be constructed in a variety of ways depending on the circumstances (Barnow et al. 1980). One way to construct the selection variable is to set it equal to the predicted scores from what will be called a ‘selection’ regression equation wherein a group-membership dummy variable is regressed linearly on to both the pretreatment measures that were used in the outcome regression equation and additional pretreatment measures that were not included in the outcome regression equation. The underlying assumption is that the unique pretreatment measures that are added to the selection regression equation are related to group assignment but do not themselves inﬂuence the post-treatment measures.

An alternative way to create the selection variable is to set it equal to the predicted scores when a group-membership dummy variable is regressed onto the pretreatment measures using probit or logistic regression. Using a probit or logistic, rather than a linear, regression produces a selection variable that is a nonlinear function of the pretreatment measures. The underlying assumption in this analysis is that the relationship between the pretreatment and post-treatment scores would be linear, in the absence of a treatment effect.

## 3. More Elaborate Nonequivalent Group Designs

A variety of design features are described below that can be added to the prototypic nonequivalent group design to improve the interpretability of the results (Shadish and Cook 1999). The logic is to create a design where the pattern of results that would arise from a treatment effect is different from the pattern of results that would arise from selection differences (Reichardt 2000).

### 3.1 Multiple Outcome Measures

The credibility of inferences from nonequivalent group comparisons can sometimes be increased by adding another outcome measure, often called a nonequivalent dependent variable (Cook and Campbell 1979). Causal inference will be enhanced if selection differences have the same effects on both the original and additional outcome measures, but the effect of the treatment differs on the two measures (or vice versa). For example, in a study of a program to aid the homeless (Braucht et al. 1995), the primary intent of the treatment was to decrease substance abuse and only secondarily to improve other outcomes such as family relationships. In contrast, selection differences, such as differences in motivation to change one’s lifestyle, should have had just as large an effect on family relationships as on substance abuse. That a large outcome difference appeared on measures of substance abuse but not on measures of family relationships increased the researchers’ conﬁdence that the positive results on substance abuse were due to the treatment, rather than to selection differences.

### 3.2 Multiple Comparison Groups

The credibility of inferences from nonequivalent group designs sometimes can be increased by using two comparison groups in the design rather than just one. For example, a design with two comparison groups allows the experimental group to be compared to each separately. This could strengthen conﬁdence in a causal inference if the treatment were known, say, to have an effect in the same direction in both comparisons while the effects of selection differences were known to operate in opposite directions. In such a case, similar results in the two comparisons suggests the treatment, rather than selection differences, is responsible for the outcome.

Alternatively, additional comparison groups might be included in the design so that different comparison groups could be used to satisfy the assumptions of different statistical analyzes or multiple comparison groups could be used to assess the plausibility of the assumptions of a given statistical analysis. For example, a change-score analysis could beneﬁt greatly if, relative to the experimental group, one comparison group is initially inferior and the other initially superior on the pretreatment assessment. The most interpretable pattern of results would be where the mean changes from pretreatment to post-treatment were the same in the two comparison groups but different in the experimental group. Such an outcome would rule out both a selection-by-maturation interaction and regression toward the mean more credibly than would be possible with only a single comparison group.

### 3.3 Multiple Time Points

It can be useful to collect both pre-and post-treatment measures at additional time points. Pretreatment measures at additional time points can be used to improve the prediction of growth trajectories in the absence of the treatment. For example, suppose pretreatment measures are collected at both times 1 and 2, different treatment protocols are introduced at time 3, and post-treatment measures are collected at time 4. Then growth trajectories estimated using the time 1 and 2 measures can help distinguish between treatment effects and selection-by-maturation interactions. With even more pretreatment measures collected over time, researchers could also assess the plausibility of regression toward the mean. In either case, growth trajectories could be estimated using either mean level analyzes such as change-score models or individual level analyzes such as latent growth curve models (McArdle 1988, Muthen and Curran 1997) or hierarchical linear models (Bryk and Raudenbush 1992).

Alternatively, pretreatment measures at times 1 and 2 could be analyzed as if they were a pretreatment and a hypothetical post-treatment measure. Because the different treatments were not introduced until later, such ‘dry-run’ analyses should produce null results except to the extent they are biased by selection differences or other threats to validity. In this way, the results of such analyzes could be used to choose or modify statistical procedures so they are less likely to be biased by selection differences when used to analyze the actual pretreatment and post-treatment data (e.g., Wortman et al. 1978).

In a similar vein, post-treatment observations at additional time points sometimes can be used to distinguish between treatment effects and selection differences. For example, imagine that the effect of the treatment would increase over time while the effect of selection differences would remain constant or decrease over time. In such a case, a pattern of observed differences that increase over time would contribute to one’s conﬁdence that these differences are due to the treatment. For example, the conclusion that smoking causes cancer was strengthened when it was found that the rate of lung cancer in women smokers rose 20–30 years after smoking became prevalent in women, while the lung cancer rate in nonsmoking women remained relatively constant. The alternative explanation that genetic differences between smokers and nonsmokers was responsible for the observed association between cancer and smoking (because genes might produce both a liking for cigarettes and greater susceptibility to lung cancer) could not explain such an outcome, because genetic differences would be expected to be relatively stable over that ‘short’ a time period.

### 3.4 Variations In Amounts Or Types Of Treatments

Adding observations at additional time points or adding multiple comparison groups provides the opportunity to introduce repetitions of the same treatment, different amounts of the same treatment, or different types of treatments. For example, suppose the treatment could be introduced, removed, and then reintroduced to the experimental group and that observations could be collected at all the time points before and after these interventions for both the experimental and comparison groups. Further, suppose the effect of the treatment is short lived (so the treatment’s effects are not present when the treatment is removed) and that the effects of selection differences are constant over time. Then a pattern of outcome differences that varies over time in accord with the varying interventions could be strong evidence in favor a treatment effect.

Another possibility is the switching replication design (Cook and Campbell 1979) where the treatment is administered to the experimental group at one point in time and then administered to the comparison group at a later point in time, with observations being collected on each group initially and after each of the treatment administrations. The premise of this design is that the difference between the groups in the effect of the treatment varies over the time periods because the administration of the treatment varies, but the effect of selection differences remains the same because the composition of the groups remains the same.

Another potential source of complex patterns with which to distinguish treatment from selection effects are designs in which groups of participants receive varying amounts of the treatment. For example, one set of the participants in the experimental group might receive the standard dose of a treatment while other sets of participants receive either half or twice the standard dose. The underlying premise of such designs is that different treatment doses produce systematically different outcomes (sometimes called the dose-response curve for the treatment), while the effects of selection differences would follow a different pattern.

### 3.5 The Beneﬁts Of Disaggregation

The preceding discussion describes potential beneﬁts of additional outcome measures, comparison groups, time points, and treatments. Incorporating these design features often requires collecting additional data, but sometimes can be accomplished by disaggregating the available data. For example, sometimes it is possible to add a nonequivalent dependent variable by disaggregating the data from an outcome variable that consists of a battery of measures. Or a comparison group sometimes can be added by dividing what was initially a single comparison group into two groups. Or comparisons between different dosage levels of the treatment might be obtained by partitioning the experimental group according to the amount of treatment received. Even additional time points sometimes can be added to a design by partitioning data that initially were aggregated, say, at yearly time intervals, into data aggregated only at monthly intervals.

## 4. Minimizing Selection Differences

Researchers should try to implement designs and analyses that are most capable of taking account of selection differences in the speciﬁc situations at hand. Applying a complex statistical procedure to data from a prototypic nonequivalent group design may be appealing in theory, but a much simpler statistical analysis combined with a more elaborate design will often be more credible in practice.

However, regardless of the simplicity or elaborateness of the research design and statistical analysis, the credibility of a nonequivalent group comparison tends to increase when selection differences are made small and the treatment effects are made large. Random assignment tends to make selection difference small. In nonequivalent group designs, selection differences can often be minimized by using an assignment procedure that is as close to random as possible. For example, Langer and Rodin (1976) assessed the effects of a treatment by drawing comparisons across the clients on different ﬂoors of a nursing home. In this case, the assignment of clients to treatment conditions appeared to be quasi-random in that there were no obvious mechanisms by which different types of clients would be assigned systematically to different ﬂoors.

Often selection differences are smaller when the experimental and comparison groups are composed of individuals recruited from the same organization or locale (called internal controls) than from different organizations or locales (called external controls). A special case of internal controls arises when the experimental and comparison groups are made up of earlier and later cohorts (Cook and Campbell 1979). For example, the effect of a treatment that is given to college sophomores in a given year could be assessed by drawing a comparison with the performance of sophomores at that college from the prior year. Such a design is worth considering when the cohort of sophomores from the prior year is more similar to the treatment group than would be a contemporaneous comparison group of freshman or juniors from the same college or of sophomores from a different college.

## 5. Summary

Primarily because of biases due to selection differences and because of the difficulty in taking these biases into account, nonequivalent group designs generally produce less credible results than randomized experiments. Nonetheless, nonequivalent group designs are used widely because they can be far easier to implement than randomized experiments. Researchers should try to take account of the biasing effects of selection differences by the judicious combination of analysis strategies and design features. It is also advisable to minimize selection differences a priori so as to minimize one’s reliance on either statistical procedures or design elaborations.

**Bibliography:**

- Barnow B S, Cain G C, Goldberger A S 1980 Issues in the analysis of selectivity bias. In: Stromsdorfer E W, Farkas G (eds.) Evaluation Studies Review Annual. Sage, Newbury Park, CA, Vol. 5
- Braucht G N, Reichardt C S, Geissler L J, Bormann C A, Kwiatkowski C F, Kirby Jr. M W 1995 Effective services for homeless substance abusers. Journal of Addictive Diseases 14: 87–109
- Bryk A S, Raudenbush S W 1992 Hierarchical Linear Models: Applications and Data Analysis Methods. Sage, Newbury Park, CA
- Campbell D T, Kenny D A 1999 A Primer on Regression Artifacts. Guilford Press, New York
- Cochran W G, Rubin D B 1973 Controlling bias in observational studies. Sankhya (Series A) 35: 417–46
- Cook T D, Campbell D T 1979 Quasi-experimentation: Design and Analysis Issues for Field Settings. Rand McNally, Chicago
- Fetterman D M 1982 Ibsen’s baths: Reactivity and insensitivity. Educational Evaluation and Policy Analysis 4: 261–79
- Judd C M, Kenny D A 1981 Estimating the Effects of Social Interventions. Cambridge University Press, New York
- Lam J A, Hartwell S W, Jekel J F 1994 ‘I prayed real hard, so I know I’ll get in’: Living with randomization. In: Conrad K J (ed.) Critically Evaluating the Role of Experiments. New Directions for Program Evaluation, No. 63. Jossey-Bass, San Francisco
- Langer E J, Rodin J 1976 The effects of choice and enhanced personal responsibility for the aged: A ﬁeld experiment in an institutional setting. Journal of Personality and Social Psychology 34: 191–98
- McArdle J J 1988 Dynamic but structural equation modeling of repeated measures data. In: Nesselroade J R, Cattell R B (eds.) The Handbook of Multivariate Experimental Psychology, 2nd edn. Plenum, New York
- Muthen B O, Curran P J 1997 General longitudinal modeling of individual differences in experimental designs: A latent variable framework for analysis and power estimation. Psychological Methods 2: 371–402
- Reichardt C S 1979 The statistical analysis of data from nonequivalent group designs. In: Cook T D, Campbell D T (eds.) Quasi-experimentation: Design and Analysis Issues for Field Settings. Rand McNally, Chicago
- Reichardt C S 2000 A typology of strategies for ruling out threats to validity. In: Bickman L (ed.) Research Design: Donald Campbell’s Legacy. Sage, Thousand Oaks, CA, Vol. 2
- Rosenbaum P R 1995 Observational Studies. Springer-Verlag, New York
- Shadish W R, Cook T D 1999 Comment–Design rules: More steps toward a complete theory of quasi-experimentation. Statistical Science 14: 294–300
- Sorbom D 1978 An alternative methodology for analysis of covariance. Psychometrika 43: 381–96
- Wortman P M, Reichardt C S, St. Pierre R G 1978 The ﬁrst year of the educational voucher demonstration: A secondary analysis of student achievement test scores. Evaluation Quarterly 2: 193–214