Sample Control Variable Research Paper. Browse other research paper examples and check the list of research paper topics for more inspiration. iResearchNet offers academic assignment help for students all over the world: writing from scratch, editing, proofreading, problem solving, from essays to dissertations, from humanities to STEM. We offer full confidentiality, safe payment, originality, and money-back guarantee. Secure your academic success with our risk-free services.

An important goal of empirical research is to support causal assertions about the eﬀects of explanatory variables on outcome variables. Extraneous variables that inﬂuence the outcome represent potential confounds whose eﬀects must be eliminated before causal assertions regarding the explanatory variable can be made. In its broad sense, the term ‘control’ refers to a researcher’s systematic attempts to eliminate the confounding eﬀects of extraneous variables associated with sampling, measurement, experimental manipulation, randomization, and statistical adjustment. The term ‘control variable’ is used most commonly in the narrow context of statistical adjustment of potential confounds.

## Academic Writing, Editing, Proofreading, And Problem Solving Services

#### Get 10% OFF with 24START discount code

The current presentation of ‘control variables’ is guided by three broad tenets. (a) Strategies of statistical control and its interpretation depend on an understanding of the nature of the relationship be-tween extraneous variables and the explanatory and outcome variables. The term control variable is best understood within this broader explanatory frame-work. (b) The nature of potential confounds and therefore the need for their statistical control depends on the speciﬁc research design. The need for statistical control increases as the inherent controls imposed by the research design diminish. (c) The potential con-founding variable may be observed or unobserved. Although most methodological and empirical literature on statistical control focuses on the control of observed variables, this research paper is organized around prototypical methods of controlling for both observed and unobserved sources of extraneous variability. This approach highlights the importance of a causal frame-work for understanding statistical control and also pinpoints the most common cause of the failure of observed variables methods, i.e., the presence of unobserved sources of confound.

## 1. Controlling For Observed Variables

Several methods of control are available when it can be assumed that the potential confounding variables are measured including regression, matching, and propensity score. Regression (Pedhazur and Pedhazur 1991) or analysis of covariance (ANCOVA; Hutiema 1980) based approaches are the most commonly used methods of statistical control. The purpose and interpretation of regression adjustment depends on the speciﬁc research design employed.

### 1.1 Regression And ANCOVA Based Approaches

Regression of a dependent variable Y on an independent variable X provides an estimate of the causal eﬀect of X on Y under the critical assumption that all omitted variables C that inﬂuence Y are uncorrelated with X. If an omitted extraneous variable C that inﬂuences Y is also correlated with the independent variable X, then the regression parameter is biased as a measure of direct eﬀect of X on Y. In this case, the eﬀect of explanatory variable X is confounded with the extraneous variable C. The true direct eﬀect of X on Y (a in Fig. 1) may be obtained by measuring and including C in the model. The key assumption made in regression based approaches is that all extraneous variables are measured without error and included in the analysis.

1.1.1 Regression Adjustment In Randomized Experiments. Randomized experiments (Cook and Campbell 1979) oﬀer the most robust method for making causal inferences and imply two powerful means of control. These are: (a) the independent or explanatory variable is controlled by direct manipulation. (b) All other sources of extraneous variability are controlled by randomization or chance. Random assignment, if successful, makes experimental and control groups probabilistically equal with respect to all potential confounding variables and hence minimizes bias due to initial diﬀerences among treatment conditions. As a result, extraneous variables are expected to be uncorrelated with the manipulated variable or the treatment assignment and their exclusion does not bias the estimates of treatment eﬀects. On the other hand, including such variables in analysis reduces the unexplained within-group variability in the outcome, thus increasing the power for detecting treatment eﬀects. Pretest measures of the de-pendent variable are frequently used for this purpose. It is important to note that randomization may fail to make the experimental conditions equal with respect to various confounders. In addition, noncompliance and attrition may render various treatment conditions noncomparable.

1.1.2 Regression Adjustment In Quasi-Experimental And Nonexperimental Designs. In quasi-experimental designs, the independent variable is manipulated in the absence of random assignment. In these designs, subjects may be nonrandomly assigned to or may self-select them-selves into diﬀerent conditions. For example, in a study investigating the eﬀects of remedial education on achievement, students who are expected to beneﬁt the most may be assigned by the researcher into the treatment condition. As a result, the groups may not necessarily be equal at baseline with respect to various observed and unobserved characteristics that may inﬂuence the outcome of the treatment. In nonexperimental set-tings, the researcher may wish to infer causation from the observed association between a hypothesized independent variable and an outcome, e.g., the eﬀect of smoking on lung cancer.

The purpose of regression adjustment in non-experimental designs is to make the experimental conditions statistically equal with respect to various observed characteristics that may also inﬂuence the outcome. The success of regression based approaches in quasi-experimental and observational research depends on the plausibility of the assumption that all potential confounding variables are measured without error and included in the analysis. This assumption may not be tenable in most quasi-experimental settings where self-selection or assignment to treatment may be based on unmeasured expectations about outcome under treatment and control conditions.

The observed characteristics that are typically con-trolled to include the pretest measure of the outcome variable and background characteristics such as gen-der, age, and socioeconomic status. The control variable frequently serves as a proxy for some other unmeasured theoretical variable of interest. This method of controlling for theoretical variables is a common practice in applied settings and leads to a number of additional problems. Proxy variables, unlike the measured indicators of a construct, contain both random error as well as unspeciﬁed sources of systematic errors. The presence of random measurement error in regressors leads to biased estimates of the regression parameters. On the other hand, the presence of systematic variability other than the hypothesized construct introduces new and unspeciﬁed sources of bias. As a result, the use of background characteristics as proxies in evaluation research may result in contradictory ﬁndings regarding the eﬀects of the explanatory variable in question depending on the speciﬁc set of covariates included in the analysis.

1.1.3 Control Variable: Validity And Diﬀerential Treatment Eﬀectiveness. The use of control variables for statistical adjustment is motivated primarily by a desire to increase the internal validity of the study. An alternative way of eliminating confounding due to extraneous variables is to include only those individuals at a speciﬁc level of the confounding variable. For example, if ethnicity and gender are related to the treatment assignment and to outcome, the researcher may choose to include only white males in the study. Such control by exclusion limits the generalizability of the ﬁndings to the population actually included in the study. In contrast, the regression based approach allows generalization across all levels of the controlled covariate present in the sample if the treatment eﬀects are the same within each level of the covariate. In other words, the possibility of diﬀerential treatment eﬀects across levels of the covariate must be ruled out before the results can be generalized. For example, a speciﬁc intervention for treating childhood aggression may be more eﬀective at higher levels of initial aggression. In this case, the initial level of aggression could be included in the analysis along with an interaction term with the manipulated variable to test if the magnitude of the treatment eﬀect depends on the level of the observed covariate. When the interaction between the independent variable and a covariate is signiﬁcant, the covariate is said to moderate the eﬀect of the independent variable on the outcome. In this situation, the arbitrary distinction between an explanatory and a control variable begins to blur. The researcher must now explore the mechanism of diﬀerential treatment eﬀectiveness and ascribe proper causal status to both the covariate and the explanatory variable.

### 1.2 Propensity Score

In contrast to the regression approaches, the propensity score approach explicitly models the treatment selection assignment process using the observed back-ground characteristics. The propensity score P(T|C) is the probability that an individual with a given set of background characteristics C will be assigned to a particular treatment group (Rosebaum 1995). If C also has an eﬀect on the outcome variable, then conditional on the propensity score, the treatment assignment will be random with respect to the outcome. The propensity score captures the collective inﬂuence of all observed covariates on treatment assignment. In other words, by controlling for the propensity score, it is possible to control for all observed characteristics that determine treatment assignment. This can be achieved by including the propensity score in the regression equation (Fig. 2) or by matching on propensity score. The propensity score may be estimated using logit or probit regression by regressing observed treatment assignment on the set of covariates and their interactions.

## 2. Controlling For Unobserved Variables

Both the regression and propensity score methods make a strong assumption that all potential sources of confound are measured and included in the analysis. It is likely that selection into or exposure to a treatment depends on a subject’s expectations regarding the outcome. For example, an individual’s implicit beliefs regarding expected outcomes in the treatment and control groups may inﬂuence their decision to participate in a particular treatment as well as the eventual outcome. Even in randomized experiments, treatment compliance and attrition may be related to such unobserved variables. In this situation, observed background characteristics may not contain relevant information regarding a subject’s decision process. As a result, methods of control based on observed variables are likely to produce biased estimates of treatment eﬀects. Rosebaum (1995) provides methods of assessing the sensitivity of the results to potential hidden biases in the context of observational studies. The problem of isolating true causal eﬀects from observed association in the presence of latent confounds can be thought of as an identiﬁcation problem. If one is willing to make certain assumptions about the nature of treatment eﬀects under various conditions, it is possible to express treatment eﬀects using in-equalities or bounds (Manski 1995).

Alternatively, the identiﬁcation issue can be re-solved by measuring additional variables that are causally related to the explanatory and outcome variables but are assumed to be independent of the unobserved confounds. In other words, the presence of such variables allows the true causal eﬀects of an independent variable to be isolated from the un-observed sources of confounds. Such variables can help identify the true causal eﬀects under two circumstances: (a) if the eﬀects of these variables on the outcome are entirely mediated by the independent variable or (b) if the variable mediates the eﬀect of the independent variable on the outcome.

### 2.1 Instrumental Variable Method

The instrumental variable approach for controlling unobserved sources of variability is the mirror opposite of the propensity score method for controlling observed variables (Angrist et al. 1996, Winship and Morgan 1999). Unlike an observed control variable, an instrumental variable is assumed not to have any direct eﬀect on the outcome. Instead, the instrumental variable is thought to inﬂuence only the selection into the treatment condition. In other words, the eﬀect of the instrumental variable on the dependent measure is entirely mediated via its eﬀect on treatment assignment (b in Fig. 3). This condition is also known as exclusion restriction. If the independent variable were regressed on the instrumental variable, the residual would contain all unobserved sources of variability that determine treatment assignment and also inﬂuence the outcome variable (represented by the correlation r in Fig. 3). As a result, the existence of an instrumental variable identiﬁes or isolates the average direct eﬀect (a in Fig. 3) of the treatment on the outcome independent of the unobserved sources of variability. The success of this strategy rests on the reasonableness of the assumption of exclusion restriction.

### 2.2 Mediator Variable Method

The eﬀects of unobserved confounds may be isolated in a radically diﬀerent fashion by modeling the causal mechanism by which the treatment inﬂuences the outcome (Pearl 2000). In this approach, the goal is to ﬁnd an observed variable that mediates the eﬀect of the independent variable on the outcome but is unrelated to the unobserved common cause (Fig. 4). For example, in the past cigarette companies have claimed that the relationship between smoking and lung cancer is spurious: a result of a gene (common-cause) that leads to craving for tobacco and also causes lung cancer. If such a common cause in fact existed, it is still possible to isolate the independent eﬀect of smoking on lung cancer by identifying a variable that mediates the eﬀects of smoking on cancer but is not inﬂuenced by the hypothesized common-cause. For example, tar in lungs is one such variable known to cause cancer which accumulates in lungs as a result of smoking, passive smoke inhalation, and environ-mental pollution. It is reasonable to argue that the level of tar in lungs is not caused by the common-gene, especially among the passive smokers. The existence of such a mediator variable can statistically identify the eﬀect of the independent variable on the dependent variable (a×b in Fig. 4) even in the presence of unobserved sources of confounding. It is to be noted that the instrumental and mediator variables are not control variables themselves. Instead, the use of these variables identiﬁes true treatment eﬀects by isolating the confounding eﬀects of other unobserved sources of variability.

**Bibliography:**

- Angrist J D, Imbens G W, Rubin D D 1996 Identiﬁcation of causal eﬀects using instrumental variables. Journal of the American Statistical Association 91: 444–72
- Cook T D, Campbell D T 1979 Quasi-Experimentation: Design & Analysis Issues for Field Settings. Rand McNally College Pub. Co, Chicago
- Hutiema B E 1980 The Analysis of Covariance and Alternatives. Wiley, New York
- Manski C F 1995 Identiﬁcation Problem in the Social Sciences. Harvard University Press, Cambridge, MA
- Pearl J 2000 Causality: Models, Reasoning and Inference. Cambridge University Press, Cambridge, UK
- Pedhazur E J, Pedhazur S L 1991 Measurement, Design and Analysis: An Integrated Approach. Lawrence Erlbaum Associates, NJ
- Rosebaum P R 1995 Observational Studies. Springer-Verlag, New York
- Winship C, Morgan S L 1999 The estimation of causal eﬀects from observational data. Annual Review of Sociology 25: 659–707