Observational Studies Research Paper

Sample Observational Studies Research Paper. Browse other research paper examples and check the list of research paper topics for more inspiration. If you need a religion research paper written according to all the academic standards, you can always turn to our experienced writers for help. This is how your paper can get an A! Feel free to contact our research paper writing service for professional assistance. We offer high-quality assignments for reasonable rates.

1. Introduction

1.1 Deﬁnition

Cochran (1965) deﬁned an observational study as a comparison of treated and control groups in which

Academic Writing, Editing, Proofreading, And Problem Solving Services

Get 10% OFF with 24START discount code

the objective is to elucidate cause-and-effect relationships [ … in which it] is not feasible to use controlled experimentation, in the sense of being able to impose the procedures or treatments whose effects it is desired to discover, or to assign subjects at random to different procedures.

Treatments, policies and interventions that are expected to beneﬁt recipients can be studied in experiments that randomly allocate experimental subjects to treatment groups, so comparable subjects receive competing treatments (Boruch 1997, Piantadosi 1997, Friedman et al. 1998). Observational studies are common, nonetheless, in most ﬁelds that study the effects of treatments on people, because harmful or unwanted treatments cannot be imposed on human subjects. When treatments are not randomized, subjects assigned to competing treatments may not be comparable, so differences in outcomes may or may not be effects caused by the treatments (Rubin 1974).

1.2 Examples

Two examples of observational studies are described. Later sections refer to these examples.

1.2.1 Long-Term Psychological Effects Of The Death Of A Close Relative. Lehman et al. (1987) attempted to estimate the long term psychological effects of the sudden death of a spouse or a child in a car crash. Starting with 80 bereaved spouses and parents, they matched 80 controls drawn from 7581 individuals who came to renew a drivers license, matching for gender, age, family income before the crash, education level, number and ages of children. Contrasting their ﬁndings with the views of Bowlby and Freud they concluded:

Contrary to what some early writers have suggested about the duration of the major symptoms of bereavement … both spouses and parents in our study showed clear evidence of depression and lack of resolution at the time of the interview, which was ﬁve to seven years after the loss occurred.

1.2.2 Effects On Children Of Occupational Exposures To Lead. Morton et al. (1982) asked whether children were harmed by lead brought home in the clothes and hair of parents who were exposed to lead at work. They matched children whose parents worked in a battery factory to unexposed control children of the same age and neighborhood, comparing the level of lead found in the children’s blood, ﬁnding elevated levels of lead in exposed children.

In addition, they compared exposed children whose parents had varied levels of exposure to lead at the factory, ﬁnding that parents who had higher exposures on the job in turn had children with more lead in their blood. Finally, they compared exposed children whose parents had varied hygiene upon leaving the factory at the end of the day, ﬁnding that poor hygiene of the parent predicted higher levels of lead in the blood of the child.

1.3 Central Issues

Without random assignment of treatments, treated and control groups may not have been comparable prior to treatment, so differing outcomes may reﬂect either biases from these pretreatment differences or else effects actually caused by the treatment. The difficulty in distinguishing treatment effects from biases is the key consequence of the absence of randomization.

A variable measured prior to treatment is not affected by the treatment and is called a covariate. A variable measured after treatment may have been affected by the treatment and is called an outcome. An analysis that does not carefully distinguish covariates and outcomes can introduce biases into the analysis where none existed previously.

A pretreatment difference between treated and control groups is called an o ert bias if it is accurately measured in the data at hand, and it called a hidden bias if it is not measured. For instance, if treated subjects are observed to be somewhat older than controls, and if age is recorded, then this is an overt bias. If treated subjects consumed more illegal narcotics than controls, and if accurate records of this are not available, then this is a hidden bias. Typically, overt biases are immediately visible in the data at hand, while hidden biases are a matter of concerned speculation and investigation.

In most observational studies, an effort is made to remove overt biases using one or more analytical methods, such as matched sampling, stratiﬁcation, or model-based adjustments such as covariance adjustment. Adjustments are discussed in Sect. 2. Once this is accomplished, attention turns to addressing possible hidden biases, including efforts to detect hidden biases and to study the sensitivity of conclusions to biases of plausible magnitude. Hidden biases are discussed in Sects. 3–5.

2. Adjusting For Overt Biases

2.1 Matched Sampling

2.1.1 Selecting From A Reservoir Of Potential Controls. Matching is the most direct and intuitive method of adjustment, in which an effort is made to compare each treated individual to one or more comparable controls. Matched sampling is most common when a small treated group is available together with a large reservoir of potential controls (Rubin 1973a). For instance, in the study of bereavement by Lehman et al. (1987), there were 80 bereaved spouses and parents and 7581 potential controls, from whom 80 matched controls were selected.

The structure of the bereavement study is typical: computerized administrative records aid in identifying and matching treated and control subjects, but additional information is required from matched subjects for research purposes; in this case, psychiatric outcomes are required. Often it is not practical to obtain the needed research data from all controls in the reservoir and instead a sample of controls is used. Matched sampling selects controls who appear comparable to treated subjects.

2.1.2 Goals Of Matching. Matching attempts to do three things: (a) produce matched pairs or sets that appear similar in terms of observed covariates, (b) produce treated and control groups with similar distributions of observed covariates, (c) remove that part of the bias in estimated treatment effects that is due to overt biases or imbalances in observed covariates. For instance, with pairs matched for age, goal (a) would be accomplished if each treated subject were matched to a control of the same age, whereas goal (b) would be accomplished if the entire treated group had an age distribution that closely resembled the age distribution in the matched control group, for instance, the same mean age, the same quartiles of the age distribution, and so on.

It is possible to demonstrate (Rosenbaum and Rubin 1983) that goal (a) suffices but is not necessary for goal (b), and that goal (b) suffices for goal (c). Pairs that are individually matched for age will balance age, but it is also possible to balance age with matched pairs that are not individually matched for age, and any matching that balances the distribution of age removes the part of the bias in estimated treatment effects due to imbalances in age. This is fortunate, because it is difficult to match individually on many covariates at once, but it is not so difficult to balance many covariates at once . For instance, with 20 binary covariates, there are 2²⁰or about a million types of individuals, so even with thousands of potential controls, it will often be difficult to ﬁnd a control who matches a treated subject on all 20 covariates. Nonetheless, balancing 20 covariates is often quite feasible. As a result, modern matching algorithms focus on balancing covariates, while comparability within pairs plays a secondary role. A tool in balanced matching is the propensity score.

2.1.3 Propensity Score. The propensity score is the conditional probability of receiving the treatment rather than the control given the observed covariates (Rosenbaum and Rubin 1983). Note carefully that the propensity score is deﬁned in terms of the observed covariates even if there may be hidden biases due to unobserved covariates. In the simplest randomized experiment, treatment or control is assigned by the ﬂip of a fair coin, and the propensity score equals for all subjects no matter what covariates are observed. In an observational study, subjects with certain observed characteristics may be more likely to receive either treatment or control, so the propensity score varies with these observed characteristics.

Matching on one variable, the propensity score, tends to balance all of the observed covariates. Matching on the propensity score together with a few other observed covariates—say, matching on the propensity score together with age and gender—also tends to balance all of the observed covariates. If it suffices to adjust for the observed covariates—that is, if there is no hidden bias due to unobserved covariates—then it also suffices to adjust for the propensity score alone. These results are Theorems 1 through 4 of Rosenbaum and Rubin (1983).

In practice, the propensity score is unknown and must be estimated. For instance, one might estimate the propensity score using logit regression (Cox and Snell 1989) of assigned treatment on observed covariates, perhaps including interactions, quadratics and transformations of the covariates. Matching on a single variable, such as the estimated propensity score, is often practical whereas matching on many covariates is not; yet matching on the propensity score balances many covariates.

2.1.4 Structure Of Matched Sets. Most commonly, each treated subject is matched to exactly one control, but other matching structures may yield either greater bias reduction or estimates with smaller standard errors or both. In particular, if the reservoir of potential controls is large, and if obtaining data from controls is not prohibitively expensive, then the standard errors of estimated treatment effects can be substantially reduced by matching each treated subject to several controls (Smith 1997).

It is not necessary to match every treated subject to the same ﬁxed number of controls. In terms of bias reduction, the optimal form is a full matching in which each matched set may have one treated subject and several controls or else one control and several treated subjects (Rosenbaum 1995). In this way, when a certain pattern of observed covariates is typical of controls, then a treated subject with this pattern will have several controls, but if instead the pattern is typical of treated subjects, several treated subjects may share one control. In a simulation study, full matching was much better than matching with a ﬁxed number of controls.

2.1.5 Matching Algorithms. Matching algorithms often express the difference in covariate values between a treated subject and a potential control in terms of a distance. One then matches a treated subject to a control who is close in terms of this distance. In the reservoir of potential controls, the one control who is closest to the ﬁrst treated subject may also be the closest to the second treated subject, and some rule or algorithm is needed to assign controls to treated subjects. Optimal matching minimizes the total distance within matched sets by solving a minimum cost ﬂow problem. The use of optimal matching in observational studies is illustrated in Rosenbaum (1995) and an implementation in the statistical package SAS is discussed by Bergstralh et al. (1996).

2.2 Stratiﬁcation

An alternative to matching is stratiﬁcation, in which subjects are grouped into strata so that, within each stratum, treated and control subjects have similar distributions of observed covariates. For instance, one might divide subjects into strata based on age, and then compare treated and control subjects of about the same age.

Cochran (1968) showed that ﬁve strata formed from a single continuous covariate can remove about 90 percent of the bias in that covariate. It is difficult to form strata that are homogeneous in many covariates for the same reason that it is difficult to match exactly on many covariates; see Sect. 2.1.2. As a result, strata are formed using the propensity score that balance many covariates, so within a stratum, subjects may vary in terms of age, gender, race, and wealth, but the distribution of these variables would be similar for treated and control subjects in the same stratum (Rosenbaum and Rubin 1983).

The optimal stratiﬁcation—that is, the stratiﬁcation that makes treated and control subjects as similar as possible within strata—is a full matching as described in Sect. 2.1.4. An optimal full matching, hence also an optimal stratiﬁcation, can be determined using network optimization, as discussed in Sect. 2.1.5. See Rosenbaum (1995) for proof and speciﬁcs.

2.3 Model-Based Adjustments

Matched sampling and stratiﬁcation compare treated subjects directly to actual controls who appear comparable in terms of observed covariates. For instance, as noted in Sect. 1.2.1, Lehman et al. (1987) compared bereaved individuals to matched controls who were comparable in terms of age, gender, income, education, and family structure. In contrast, model-based adjustments, such as covariance adjustment, use data on treated and control subjects without regard to their comparability, relying on a model, such as a linear regression model, to predict how subjects would have responded under treatments they did not receive.

If the model used for model-based adjustment is precisely correct, then model-based adjustments may be more efficient than matching and stratiﬁcation, producing estimates with smaller standard errors. On the other hand, if the model is substantially incorrect, model-based adjustments may not only fail to remove overt biases, they may even increase them, whereas matching, and stratiﬁcation are fairly consistent at reducing overt biases (Rubin 1973b, 1979).

Alas, in practice, one can often detect substantial errors in a model, but rarely can have much conﬁdence that the model is precisely correct, so there is rarely a clear choice between efficiency of model-based adjustments when the model is true and the robustness of matching and stratiﬁcation when it is not. This has led to the use of combined techniques.

In simulation studies, Rubin (1973b, 1979) compared matching, covariance adjustment, and combinations of these techniques, concluding that the use of covariance adjustment within matched pairs was superior to either method alone, being both robust and efficient.

3. Detecting Hidden Bias

3.1 Elaborate Theories

In an observational study, the investigator takes active steps to detect hidden biases—to collect data recording visible traces of unobserved pretreatment differences if they exist. In this, the investigator is aided by an ‘elaborate theory,’ deﬁned by Sir Ronald Fisher in the following discussion from Cochran (1965, Sect. 6).

About 20 years ago, when asked in a meeting what can be done in observational studies to clarify the step from association to causation, Sir Ronald Fisher replied: ‘Make your theories elaborate.’ The reply puzzled me at ﬁrst, since by Occam’s razor, the advice usually given is to make theories as simple as is consistent with known data. What Sir Ronald meant, as subsequent discussion showed, was that when constructing a causal hypothesis one should envisage as many different consequences of its truth as possible, and plan observational studies to discover whether each of these consequences is found to hold.

Consider the study of lead exposures by Morton et al. (1982), discussed in Sect. 1.2.2. Their elaborate theory made three predictions: (a) higher lead levels in the blood of exposed children than in matched control children, (b) higher lead levels in exposed children whose parents had higher exposure on the job, and (c) higher lead levels in exposed children whose parents practiced poorer hygiene upon leaving the factory. Since each of these predictions was consistent with observed data, to attribute the observed associations to hidden bias rather than an actual effect of lead exposure, one would need to postulate biases that could produce all three associations. Additionally, these observations are consistent with other studies of lead exposure. The use of elaborate theories and further examples are discussed in Campbell and Stanley (1963), Cook and Campbell (1979), Campbell (1988), Rosenbaum (1995, Sects. 5–8), and Sobel (1995).

3.2 Two Control Groups

The simplest and perhaps the most common study design to detect hidden bias uses two control groups. Since neither control group received the treatment, systematic differences between the control groups cannot be effects of the treatment and must instead be some form of hidden bias. One seeks two control groups that would differ in their outcomes if a speciﬁc unobserved bias were present. In this way, hidden biases should produce a difference between the two control groups, whereas an actual treatment effect should produce relatively similar results in the two control groups and a different result in the treated group.

This is the principle of ‘control by systematic variation,’ which is due in general to Bitterman (1965), and is applied to observational studies by Campbell (1969) and Rosenbaum, 1995a Sect. 7). If one cannot ensure comparability of treated and control groups with respect to a particular variable, say by matching, because that variable cannot be observed, then one seeks two control groups that are known to differ markedly on that unobserved variable, that is, to systematically vary the unobserved variable. Control by systematic variation is an instance of ‘strong inference’ in the sense discussed by Platt (1964).

In an interesting study that used two control groups, Zabin et al. (1989) investigated the effects of having an abortion on the education, psychological status, and subsequent pregnancies of black teenage women in Baltimore. In the US, there is currently active debate about whether abortions should be legal, with concern expressed about both the fetus and the mother. The effects—harmful or beneﬁcial—of abortions on young women are therefore of interest, but a moment’s thought reveals some serious difficulties in devising an empirical study. A woman may have an abortion only if she is pregnant, in which case, if she does not have an abortion, then with good health she will have a child. An advocate who believed abortions should be legal might wish to compare a teenager having an abortion to a teenager having a child. An advocate who believed abortions should be illegal would probably object to this comparison, saying that it compares two people who are far from comparable, because one has a child and the other does not. If having an abortion is anywhere near as harmful as having an unwanted child, this second advocate might say, then abortions are harmful indeed. The second advocate might say that if abortion was illegal, young women would be more careful about avoiding unwanted pregnancies, so it is more appropriate to compare a young woman who had an abortion to a young woman who was not pregnant. In addition to these ambiguities about what is meant by ‘the effect of abortion,’ such a study would face powerful selection effects. The decision by a young pregnant woman to have an abortion or have a child may reﬂect many characteristics, such as the degree to which she is determined to complete high school. Sexually active teenagers who do not use adequate birth control may tend to differ in many ways from those who do not become pregnant.

Zabin et al. (1989) used two control groups derived from 360 young black women who came for pregnancy tests at two clinics in Baltimore. The ‘treated’ group consisted of women who were pregnant and had an abortion. One control group consisted of women who were pregnant and had a child. The second control group consisted of women whose pregnancy test revealed they were not pregnant. Because the second control group consisted of women who suspected they might be pregnant, it is likely to be less biased than a control group of women who simply were not pregnant. The second control group did not have to decide whether to have an abortion or a child, so it is not subject to the self-selection effect that formed the treated group and the ﬁrst control group. If one were viewing nothing but a selection effect based on the decision to have an abortion or a child, then the second control group should be found between the treated group and the ﬁrst control group. Many outcomes were studied with varied results. As one illustration, consider ‘negative educational change’ two years after the pregnancy test. In the abortion group, 17.8 percent had a negative educational change, whereas in the child bearing group it was 37.3 percent and in the negative pregnancy test group it was 37.4 percent. Despite the ambiguities in this study, it would be hard to argue that this pattern of results supports a claim that abortions cause negative educational change at two years. The two control groups together provide more information about hidden biases than either group provides alone, in part because the control groups systematically vary certain important unobserved differences.

4. Appraising Sensitivity To Hidden Bias

The analytical adjustments discussed in Sect. 2 can often remove overt biases accurately recorded in the data at hand, but there is typically a legitimate concern that treated and control groups differed prior to treatment in ways that were not recorded. A sensitivity analysis asks how such hidden biases might alter the conclusions of the study.

The ﬁrst formal method of sensitivity analysis was developed to aid in appraising the effects of cigarette smoking on human health. Responding to objections that smoking might not cause lung cancer, but rather that there might be a genetic predisposition both to smoke and to develop lung cancer, Cornﬁeld et al. (1959, p. 40) wrote:

… if cigarette smokers have nine times the risk of nonsmokers for developing lung cancer, and this is not because cigarette smoke is a causal agent, but only because cigarette smokers produce hormone X, then the proportion of hormone X-producers among cigarette smokers must be at least nine times greater than among nonsmokers.

This statement was an important conceptual advance. It is, of course, commonly understood that association does not imply causation—that any observed association can be explained by a hypothetical argument, perhaps an implausible hypothetical argument, postulating an unobserved variable. Although this common understanding is always correct, it contributes little to our appraisal of an observational study, because it makes no reference to what was actually observed in this empirical investigation, and no reference to scientiﬁc knowledge about the phenomenon under study. A sensitivity analysis replaces the statement—‘association does not imply causation’— by a speciﬁc statement about the magnitude of hidden bias that would need to be present to explain the associations actually observed. Strong associations in large studies can only be explained by large, perhaps implausible, biases. Aspects of the method of Cornﬁeld et al. (1959) are discussed by Greenhouse (1982). Although important as a conceptual advance, this method of sensitivity analysis is limited in that it ignores sampling error produced by having a ﬁnite sample rather than the population as a whole, and it is applicable only to binary outcomes.

A related method addresses sampling error and may be used with outcome measures of all kinds. Observing that in a randomized experiment, subjects in the same stratum or matched set have the same probability of receiving the treatment rather than the control, the sensitivity analysis introduces a sensitivity parameter Γ which measures the degree of departure from random assignment of treatments. Speciﬁcally, two subjects who look the same in terms of observed covariates, and so are in the same matched set or stratum, may differ in their odds of receiving the treatment by at most a factor of Γ. In a randomized experiment, Γ = 1. If Γ = 2 in an observational study, then if two people look the same in terms of observed covariates, one might be twice as likely to receive the treatment as the other because they differ in terms of a covariate not observed. For each value of Γ, it is possible to place bounds on conventional statistical inferences—e.g., for Γ = 2, perhaps, the true P-value is unknown, but must be between 0.001 and 0.021. Similar bounds are available for point and interval estimates. One then asks: How large must Γ be before the conclusions of the study are qualitatively altered? If the conclusions change for Γ = 1.2 then the study is very sensitive to hidden bias—very small departures from a randomized experiment can explain the observed association between treatment and outcome— but if the conclusions do not change for Γ = 9, then only quite dramatic hidden biases can explain the observed association. A nontechnical introduction to this method of sensitivity analysis is given in Rosenbaum (1991), and a detailed development in Rosenbaum (1995, Sect. 4).

Methods of sensitivity analysis are discussed by Angrist et al. (1996), Copas and Li (1997), Cornﬁeld et al. (1959), Gastwirth (1992), Greenhouse (1982), Lin et al. (1998), Manski (1990, 1995), Marcus (1997), and Rosenbaum (1995).

5. Reducing Sensitivity To Hidden Bias

5.1 The Goal: Sharper Effects, Smaller Biases

Given the choice, one would prefer conclusions insensitive to hidden biases, so that there is less ambiguity about the effects caused by the treatment. In general, an observational study will be less sensitive to biases of plausible size if the treatment effect size is made clearer, sharper or larger, and if the magnitude of bias that is plausible is made smaller. A diverse collection of strategies used in observational studies may be viewed as efforts to bring about this situation, that is, to observe larger, sharper treatment effects while rendering large biases less plausible. If successful, these strategies would reduce sensitivity to hidden bias.

This section discusses three such strategies, namely choice of circumstances, instrumental variables, and coherent hypotheses.

5.2 Choice Of Circumstances

In the earliest stages of planning an observational study, the investigator chooses the circumstances in which the study will be conducted. These choices often affect both the size of the treatment effect and the magnitude of hidden bias that is plausible. As a result, these choices affect the sensitivity of the study to hidden bias. Some illustrations of this use of ‘choice’ will be brieﬂy mentioned. See Rosenbaum (1999b) for detailed discussion of these cases and others with examples, and see Meyer (1995) and Angrist and Krueger (1998) for related discussion.

The investigator examines a broad research hypothesis, one that makes predictions about innumerable circumstances, but examines that hypothesis in particular circumstances where dramatic effects are anticipated, free of distortions from other factors that affect these same outcomes. The investigator seeks a situation in which there is a genuine control group completely shielded from the treatment, and a treated group that received the treatment at an intense dose. One prefers a treatment that is imposed suddenly, not gradually, and haphazardly, not in direct response to characteristics of the individuals under study.

For instance, in the study in Sect. 1.2.1 by Lehman et al. (1987), the effects of bereavement were studied following sudden deaths of close relatives from car crashes. Moreover, employing standardized criteria, they used only those car crashes for which the victim’s car was not responsible, reasoning that alcohol or drug abuse or certain forms of psychopathology might increase the risk of a car crash and also be associated with unfavorable psychological outcomes in the family.

As another example, Angrist and Lavy 1999 studied the effects of class size on academic achievement in carefully chosen circumstances. In Israel, a rule dating back to Maimonides requires class rooms with no more than 40 students, so a class with 41 students is divided in two classes of about 20 students. Whereas in the United States, a class of size 40 is likely to be found in a different economic environment from a class of size 20, in Israel a comparatively haphazard event— the enrollment of an additional child—often separates these two situations.

Still another example is Card’s (1990) use of the Mariel Boatlift to study the effects of immigration on labor markets.

5.3 Instrumental Variables

In some circumstances, the treatment assignment itself is subject to hidden biases that are plausibly quite large, but there is another variable, called an instrument, which has no effect on the outcome, but is associated with the treatment. The hope, sometimes realistic, is that instrument is free of hidden bias, or at least that it is affected by hidden biases that are much smaller. In an instrumental variable analysis, the magnitude of the treatment effect is modelled in terms of the treatment received, but the assignment variable is the instrument, not the treatment.

A nice example with a clear theoretical development is given by Angrist et al. (1996). They consider aspects of the health effects of military service, but of course, people who serve in the military differ in many ways from those who do not. Their instrument is the draft lottery in use in the United States between 1970 and 1973. Many people who were drafted did not serve, and many who served were not drafted, so the draft is not at all the same as the treatment under study, namely military service. However, the draft was close to random, so it is not plausible that it is affected by large hidden biases. The instrumental variable analysis models the treatment effect in terms of actual service, but uses the draft as the ‘assignment variable’ in a sense made precise in Angrist et al. (1996) and in the discussion published with that paper. See also Moffitt’s (1996, p. 463) discussion of his ‘prosaic economic example.’

It is possible to conduct an exact sensitivity analysis for hidden bias affecting the instrument; see Rosenbaum (1999a). An example presented there concerns the Card and Krueger (1994) study of the effects of increasing the minimum wage in New Jersey and not in neighboring Pennsylvania, where the treatment effect is modelled in terms of the change in wages and the state is used as an instrument.

5.4 Coherent Hypotheses

A coherent hypothesis makes many highly speciﬁc predictions about the effect of a treatment on several outcome variables, perhaps in several treatment groups, perhaps involving one or more dose-response relationships. Hill (1965) claimed coherence as an important criterion in appraising evidence that a treatment is the cause of the outcomes with which it is associated.

Coherence is a distinct concept. Unlike sensitivity analyses, coherence asks what the pattern of associations implies about hidden biases, not the degree of sensitivity of a single association. Efforts to detect hidden biases, say using systematic variation with two control groups, are efforts to collect data so that a treatment effect would reveal one pattern of associations while the most plausible hidden biases would reveal a different pattern. Unlike efforts to detect hidden biases, coherence concerns a complex pattern of associations anticipated by a causal theory, without speciﬁc reference to the patterns that plausible biases might produce. The everyday metaphor of ﬁtting like a glove—a causal theory that ﬁts data like a glove—is sometimes helpful in distinguishing coherence from other techniques. A hand and a glove are very similar but very unusual shapes—their size, the number of digits, the pattern of the digit lengths, the single opposed digit, etc.—and these features are not mere repetitions of one another—that there are ﬁ e digits does not entail that one digit will oppose the other four, or that the third digit will be longest etc. Coherence is related to ‘pattern matching’ as discussed by Campbell (1966), ‘triangulation’ as discussed by Webb (1966), and ‘multiple substantive posttests’ as discussed by Shadish and Cook (1999).

Informally, one feels that if numerous predictions of a causal theory are simultaneously conﬁrmed, it should be more difficult to attribute this to hidden bias—the study should be less sensitive to bias. This is sometimes true, sometimes false, but when it is true, the reduction in sensitivity to bias can, in principle, be substantial. See Rosenbaum (1997) and the references there for discussion. The pattern of responses that does most to reduce sensitivity to bias entails weak associations among multiple outcomes each of which is strongly associated with the treatment (Rosenbaum 1997 Sect. 4.3.4).

Bibliography:

Angrist J D, Imbens G W, Rubin D B 1996 Identiﬁcation of causal effects using instrumental variables (with discussion). Journal of the American Statistical Association 91: 444–69
Angrist J D, Krueger A B 1998 Empirical strategies in labor economics. Working paper 401, Industrial Relations Section, Princeton University. To appear in the Handbook of Labor Economics
Angrist J D, Lavy V 1999 Using Maimonides’ rule to estimate the effect of class size on scholastic achievement. Quarterly Journal of Economics 114: 533–75
Bergstralh E J, Kosanke J L, Jacobsen S J 1996 Software for optimal matching in observational studies. Epidemiology 7: 331–2
Bitterman M 1965 Phyletic differences in learning. American Psychologist 20: 396–410
Boruch R E 1997 Randomized Experiments for Planning and Evaluation: A practical guide. Sage Publications, Thousand Oaks, CA
Campbell D R 1966 Pattern matching as an essential in distal knowing. In: Hammond K R (ed.) The Psychology of Egon Brunswik. Holt, Rinehart and Winston, New York, pp. 81–106
Campbell D T 1969 Prospective: artifact and control. In: Rosenthal R L, Rosnow R L (eds.) Artifact in Behavioral Research. Academic Press, New York
Campbell D T 1988 Methodology and Epistemology for Social Science: Selected Papers. University of Chicago Press, Chicago, pp. 315–33
Campbell D T, Stanley J C 1963 Experimental and Quasi-experimental Designs for Research. Rand McNally, Chicago
Card D 1990 The impact of the Mariel Boatlift on the Miami labor market. Industrial and Labor Relations Review 43: 245–57
Card D, Krueger A B 1994 Minimum wages and employment: A case study of the fast-food industry in New Jersey and Pennsylvania. American Economic Review 84: 772–93
Cochran W G 1965 The planning of observational studies of human populations (with discussion). Journal of the Royal Statistical Society, Series A. 128: 134–55
Cochran W G 1968 Effectiveness of adjustment by subclassiﬁcation in removing bias in observational studies. Biometrics 24: 295–313
Cook T D, Campbell D T 1979 Quasi-experimentation. Rand McNally College Publisher, Chicago
Copas J B, Li H G 1997 Inference for non-random samples (with discussion). Journal of the Royal Statistical Society, Series B. 59: 55–96
Cornﬁeld J, Haenszell W, Hammond E C, Lilienfeld A M, Shimkin M B, Wynder E L 1959 Smoking and lung cancer: recent evidence and a discussion of some questions. Journal of the National Cancer Institute 22: 173–203
Cox D R, Snell E J 1989 Analysis of Binary Data. Chapman and Hall, London
Firth D, Payne C, Payne J 1999 Efficacy of programmes for the unemployed: Discrete time modelling of duration data from a matched comparison study. Journal of the Royal Statistical Society, Series A. 162: 111–20
Friedman L M, Furberg C D, DeMets D L 1998 Fundamentals of Clinical Trials. Springer, New York
Gastwirth J L 1992 Methods for assessing the sensitivity of statistical comparisons used in Title VII cases to omitted variables. Jurimetrics 33: 19–34
Greenhouse S W 1982 Jerome Cornﬁeld’s contributions to epidemiology. Biometrics Suppl. 38: 33–45
Hill A B 1965 Environment and disease: Association or causation? Proceedings of the Royal Society of Medicine 58: 295–300
Lehman D, Wortman C, Williams A 1987 Long-term effects of losing a spouse or a child in a motor vehicle crash. Journal of Personality and Social Psychology 52: 218–31
Lin D Y, Psaty B M, Kronmal R A 1998 Assessing the sensitivity of regression results to unmeasured confounders in observational studies. Biometrics 54: 948–63
Manski C F 1990 Nonparametric bounds on treatment effects. American Economic Review 80: 319–23
Manski C 1995 Identiﬁcation Problems in the Social Sciences. Harvard University Press, Cambridge, MA
Marcus S 1997 Using omitted variable bias to assess uncertainty in the estimation of an AIDS education treatment effect. Journal of Educational and Behavioral Statistics 22: 193–202
Meyer B D 1995 Natural and quasi-experiments in economics. Journal of Business and Economic Statistics 13: 151–61
Meyer M M, Fienberg S E (eds.) 1992 Assessing Evaluation Studies: The Case of Bilingual Education Strategies. National Academy Press, Washington, DC
Moffitt R A 1996 Comment on ‘‘Identiﬁcation of causal effects using instrumental variables’’ by Angrist, Imbens and Rubin. Journal of the American Statistical Association 91: 462–5
Morton D E, Saah A J, Silberg S L, Owens W, Roberts M, Saah M 1982 Lead absorption in children of employees in a lead-related industry. American Journal of Epidemiology 115: 549–55
Piantadosi S 1997 Clinical Trials: A Methodologic Perspective. Wiley, New York
Platt J R 1964 Strong inference. Science. 146: 347–53
Rosenbaum P R 1991 Discussing hidden bias in observational studies. Annals of Internal Medicine 115: 901–5
Rosenbaum P R 1995a Observational Studies. Springer, New York
Rosenbaum P R 1997 Signed rank statistics for coherent predictions. Biometrics 53: 556
Rosenbaum P R 1999a Using combined quantile averages in matched observational studies. Applied Statistics 48: 63–78
Rosenbaum P R 1999b Choice as an alternative to control in observational studies (with discussion). Statistical Science 14: 259–78
Rosenbaum P R, Rubin D B 1983 The central role of the propensity score in observational studies for causal effects. Biometrika 70: 41–55
Rubin D B 1973a Matching to remove bias in observational studies. Biometrics 29: 159–83 Correction, 1974 Biometrics 30: 728
Rubin D B 1973b Use of matched sampling and regression adjustment to remove bias in observational studies. Biometrics 29: 185–203
Rubin D B 1974 Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology 66: 688–701
Rubin D B 1979 Using multivariate matched sampling and regression adjustment to control bias in observational studies. Journal of the American Statistical Association 74: 318–28
Shadish W R, Cook T D 1999 Design rules: More steps toward a complete theory of quasi-experimentation. Statistical Science 14: 294–300
Smith H L 1997 Matching with multiple controls to estimate treatment effects in observational studies. Sociological Methodology 27: 325–53
Sobel M 1995 Causal inference in the social and behavioral sciences. In: Arminger G, Clogg C C, Sobel M E (eds.) Handbook of Statistical Modeling for the Social and Behavioral Sciences. Plenum Press, New York, pp. 1–38
Webb E J 1966 Unconventionality, triangulation and inference. In: Proceedings of the Invitational Conference on Testing Problems. Educational Testing Service, Princeton, NJ, pp. 34–43
Zabin L S, Hirsch M B, Emerson M R 1989 When urban adolescents choose abortion: effects on education, psychological status, and subsequent pregnancy. Family Planning Perspectives 21: 248–55