Sample Control Variable Research Paper. Browse other research paper examples and check the list of research paper topics for more inspiration. iResearchNet offers academic assignment help for students all over the world: writing from scratch, editing, proofreading, problem solving, from essays to dissertations, from humanities to STEM. We offer full confidentiality, safe payment, originality, and money-back guarantee. Secure your academic success with our risk-free services.
An important goal of empirical research is to support causal assertions about the effects of explanatory variables on outcome variables. Extraneous variables that influence the outcome represent potential confounds whose effects must be eliminated before causal assertions regarding the explanatory variable can be made. In its broad sense, the term ‘control’ refers to a researcher’s systematic attempts to eliminate the confounding effects of extraneous variables associated with sampling, measurement, experimental manipulation, randomization, and statistical adjustment. The term ‘control variable’ is used most commonly in the narrow context of statistical adjustment of potential confounds.
Academic Writing, Editing, Proofreading, And Problem Solving Services
Get 10% OFF with 24START discount code
The current presentation of ‘control variables’ is guided by three broad tenets. (a) Strategies of statistical control and its interpretation depend on an understanding of the nature of the relationship be-tween extraneous variables and the explanatory and outcome variables. The term control variable is best understood within this broader explanatory frame-work. (b) The nature of potential confounds and therefore the need for their statistical control depends on the specific research design. The need for statistical control increases as the inherent controls imposed by the research design diminish. (c) The potential con-founding variable may be observed or unobserved. Although most methodological and empirical literature on statistical control focuses on the control of observed variables, this research paper is organized around prototypical methods of controlling for both observed and unobserved sources of extraneous variability. This approach highlights the importance of a causal frame-work for understanding statistical control and also pinpoints the most common cause of the failure of observed variables methods, i.e., the presence of unobserved sources of confound.
1. Controlling For Observed Variables
Several methods of control are available when it can be assumed that the potential confounding variables are measured including regression, matching, and propensity score. Regression (Pedhazur and Pedhazur 1991) or analysis of covariance (ANCOVA; Hutiema 1980) based approaches are the most commonly used methods of statistical control. The purpose and interpretation of regression adjustment depends on the specific research design employed.
1.1 Regression And ANCOVA Based Approaches
Regression of a dependent variable Y on an independent variable X provides an estimate of the causal effect of X on Y under the critical assumption that all omitted variables C that influence Y are uncorrelated with X. If an omitted extraneous variable C that influences Y is also correlated with the independent variable X, then the regression parameter is biased as a measure of direct effect of X on Y. In this case, the effect of explanatory variable X is confounded with the extraneous variable C. The true direct effect of X on Y (a in Fig. 1) may be obtained by measuring and including C in the model. The key assumption made in regression based approaches is that all extraneous variables are measured without error and included in the analysis.
1.1.1 Regression Adjustment In Randomized Experiments. Randomized experiments (Cook and Campbell 1979) offer the most robust method for making causal inferences and imply two powerful means of control. These are: (a) the independent or explanatory variable is controlled by direct manipulation. (b) All other sources of extraneous variability are controlled by randomization or chance. Random assignment, if successful, makes experimental and control groups probabilistically equal with respect to all potential confounding variables and hence minimizes bias due to initial differences among treatment conditions. As a result, extraneous variables are expected to be uncorrelated with the manipulated variable or the treatment assignment and their exclusion does not bias the estimates of treatment effects. On the other hand, including such variables in analysis reduces the unexplained within-group variability in the outcome, thus increasing the power for detecting treatment effects. Pretest measures of the de-pendent variable are frequently used for this purpose. It is important to note that randomization may fail to make the experimental conditions equal with respect to various confounders. In addition, noncompliance and attrition may render various treatment conditions noncomparable.
1.1.2 Regression Adjustment In Quasi-Experimental And Nonexperimental Designs. In quasi-experimental designs, the independent variable is manipulated in the absence of random assignment. In these designs, subjects may be nonrandomly assigned to or may self-select them-selves into different conditions. For example, in a study investigating the effects of remedial education on achievement, students who are expected to benefit the most may be assigned by the researcher into the treatment condition. As a result, the groups may not necessarily be equal at baseline with respect to various observed and unobserved characteristics that may influence the outcome of the treatment. In nonexperimental set-tings, the researcher may wish to infer causation from the observed association between a hypothesized independent variable and an outcome, e.g., the effect of smoking on lung cancer.
The purpose of regression adjustment in non-experimental designs is to make the experimental conditions statistically equal with respect to various observed characteristics that may also influence the outcome. The success of regression based approaches in quasi-experimental and observational research depends on the plausibility of the assumption that all potential confounding variables are measured without error and included in the analysis. This assumption may not be tenable in most quasi-experimental settings where self-selection or assignment to treatment may be based on unmeasured expectations about outcome under treatment and control conditions.
The observed characteristics that are typically con-trolled to include the pretest measure of the outcome variable and background characteristics such as gen-der, age, and socioeconomic status. The control variable frequently serves as a proxy for some other unmeasured theoretical variable of interest. This method of controlling for theoretical variables is a common practice in applied settings and leads to a number of additional problems. Proxy variables, unlike the measured indicators of a construct, contain both random error as well as unspecified sources of systematic errors. The presence of random measurement error in regressors leads to biased estimates of the regression parameters. On the other hand, the presence of systematic variability other than the hypothesized construct introduces new and unspecified sources of bias. As a result, the use of background characteristics as proxies in evaluation research may result in contradictory findings regarding the effects of the explanatory variable in question depending on the specific set of covariates included in the analysis.
1.1.3 Control Variable: Validity And Differential Treatment Effectiveness. The use of control variables for statistical adjustment is motivated primarily by a desire to increase the internal validity of the study. An alternative way of eliminating confounding due to extraneous variables is to include only those individuals at a specific level of the confounding variable. For example, if ethnicity and gender are related to the treatment assignment and to outcome, the researcher may choose to include only white males in the study. Such control by exclusion limits the generalizability of the findings to the population actually included in the study. In contrast, the regression based approach allows generalization across all levels of the controlled covariate present in the sample if the treatment effects are the same within each level of the covariate. In other words, the possibility of differential treatment effects across levels of the covariate must be ruled out before the results can be generalized. For example, a specific intervention for treating childhood aggression may be more effective at higher levels of initial aggression. In this case, the initial level of aggression could be included in the analysis along with an interaction term with the manipulated variable to test if the magnitude of the treatment effect depends on the level of the observed covariate. When the interaction between the independent variable and a covariate is significant, the covariate is said to moderate the effect of the independent variable on the outcome. In this situation, the arbitrary distinction between an explanatory and a control variable begins to blur. The researcher must now explore the mechanism of differential treatment effectiveness and ascribe proper causal status to both the covariate and the explanatory variable.
1.2 Propensity Score
In contrast to the regression approaches, the propensity score approach explicitly models the treatment selection assignment process using the observed back-ground characteristics. The propensity score P(T|C) is the probability that an individual with a given set of background characteristics C will be assigned to a particular treatment group (Rosebaum 1995). If C also has an effect on the outcome variable, then conditional on the propensity score, the treatment assignment will be random with respect to the outcome. The propensity score captures the collective influence of all observed covariates on treatment assignment. In other words, by controlling for the propensity score, it is possible to control for all observed characteristics that determine treatment assignment. This can be achieved by including the propensity score in the regression equation (Fig. 2) or by matching on propensity score. The propensity score may be estimated using logit or probit regression by regressing observed treatment assignment on the set of covariates and their interactions.
2. Controlling For Unobserved Variables
Both the regression and propensity score methods make a strong assumption that all potential sources of confound are measured and included in the analysis. It is likely that selection into or exposure to a treatment depends on a subject’s expectations regarding the outcome. For example, an individual’s implicit beliefs regarding expected outcomes in the treatment and control groups may influence their decision to participate in a particular treatment as well as the eventual outcome. Even in randomized experiments, treatment compliance and attrition may be related to such unobserved variables. In this situation, observed background characteristics may not contain relevant information regarding a subject’s decision process. As a result, methods of control based on observed variables are likely to produce biased estimates of treatment effects. Rosebaum (1995) provides methods of assessing the sensitivity of the results to potential hidden biases in the context of observational studies. The problem of isolating true causal effects from observed association in the presence of latent confounds can be thought of as an identification problem. If one is willing to make certain assumptions about the nature of treatment effects under various conditions, it is possible to express treatment effects using in-equalities or bounds (Manski 1995).
Alternatively, the identification issue can be re-solved by measuring additional variables that are causally related to the explanatory and outcome variables but are assumed to be independent of the unobserved confounds. In other words, the presence of such variables allows the true causal effects of an independent variable to be isolated from the un-observed sources of confounds. Such variables can help identify the true causal effects under two circumstances: (a) if the effects of these variables on the outcome are entirely mediated by the independent variable or (b) if the variable mediates the effect of the independent variable on the outcome.
2.1 Instrumental Variable Method
The instrumental variable approach for controlling unobserved sources of variability is the mirror opposite of the propensity score method for controlling observed variables (Angrist et al. 1996, Winship and Morgan 1999). Unlike an observed control variable, an instrumental variable is assumed not to have any direct effect on the outcome. Instead, the instrumental variable is thought to influence only the selection into the treatment condition. In other words, the effect of the instrumental variable on the dependent measure is entirely mediated via its effect on treatment assignment (b in Fig. 3). This condition is also known as exclusion restriction. If the independent variable were regressed on the instrumental variable, the residual would contain all unobserved sources of variability that determine treatment assignment and also influence the outcome variable (represented by the correlation r in Fig. 3). As a result, the existence of an instrumental variable identifies or isolates the average direct effect (a in Fig. 3) of the treatment on the outcome independent of the unobserved sources of variability. The success of this strategy rests on the reasonableness of the assumption of exclusion restriction.
2.2 Mediator Variable Method
The effects of unobserved confounds may be isolated in a radically different fashion by modeling the causal mechanism by which the treatment influences the outcome (Pearl 2000). In this approach, the goal is to find an observed variable that mediates the effect of the independent variable on the outcome but is unrelated to the unobserved common cause (Fig. 4). For example, in the past cigarette companies have claimed that the relationship between smoking and lung cancer is spurious: a result of a gene (common-cause) that leads to craving for tobacco and also causes lung cancer. If such a common cause in fact existed, it is still possible to isolate the independent effect of smoking on lung cancer by identifying a variable that mediates the effects of smoking on cancer but is not influenced by the hypothesized common-cause. For example, tar in lungs is one such variable known to cause cancer which accumulates in lungs as a result of smoking, passive smoke inhalation, and environ-mental pollution. It is reasonable to argue that the level of tar in lungs is not caused by the common-gene, especially among the passive smokers. The existence of such a mediator variable can statistically identify the effect of the independent variable on the dependent variable (a×b in Fig. 4) even in the presence of unobserved sources of confounding. It is to be noted that the instrumental and mediator variables are not control variables themselves. Instead, the use of these variables identifies true treatment effects by isolating the confounding effects of other unobserved sources of variability.
Bibliography:
- Angrist J D, Imbens G W, Rubin D D 1996 Identification of causal effects using instrumental variables. Journal of the American Statistical Association 91: 444–72
- Cook T D, Campbell D T 1979 Quasi-Experimentation: Design & Analysis Issues for Field Settings. Rand McNally College Pub. Co, Chicago
- Hutiema B E 1980 The Analysis of Covariance and Alternatives. Wiley, New York
- Manski C F 1995 Identification Problem in the Social Sciences. Harvard University Press, Cambridge, MA
- Pearl J 2000 Causality: Models, Reasoning and Inference. Cambridge University Press, Cambridge, UK
- Pedhazur E J, Pedhazur S L 1991 Measurement, Design and Analysis: An Integrated Approach. Lawrence Erlbaum Associates, NJ
- Rosebaum P R 1995 Observational Studies. Springer-Verlag, New York
- Winship C, Morgan S L 1999 The estimation of causal effects from observational data. Annual Review of Sociology 25: 659–707