Program Evaluation Research Paper

Sample Program Evaluation Research Paper. Browse other research paper examples and check the list of research paper topics for more inspiration. If you need a research paper written according to all the academic standards, you can always turn to our experienced writers for help. This is how your paper can get an A! Feel free to contact our research paper writing service for professional assistance. We offer high-quality assignments for reasonable rates.

Evaluations are, in a broad sense, concerned with the eﬀectiveness of programs. While commonsense evaluation has a very long history, evaluation research which relies on scientiﬁc methods is a young discipline but has grown massively in recent years. Program evaluation has made important contributions to various social domains; for example, to education, health, economics, and the environment.

Academic Writing, Editing, Proofreading, And Problem Solving Services

Get 10% OFF with 24START discount code

This research paper gives an introduction to program evaluation by dealing with the following three questions: What is program evaluation? What standards can be set to enhance quality and fairness of evaluation? How to conduct a program evaluation?

1. What Is Program Evaluation?

1.1 Deﬁnition, Distinction From Other Activities, Users, And Evaluators

Evaluation, broadly construed, has a very long history. For example, in ancient Rome, tax policies were altered in response to observed ﬂuctuations in revenues, and many thousands of years ago, hunters developed complicated procedures for ﬁshing and trapping game. In recent years, commensense program evaluation has evolved into evaluation research which relies on scientiﬁc research methods. The paper focuses on the latter type of evaluation (for introductory texts on evaluation research see, e.g., Berk and Rossi 1998, Fink 1995, Pawson and Tilley 1997, Rossi and Freeman 1993). In the paper, the terms evaluation, program evaluation, and evaluation research are used synonymously. Concerns for the ﬁrst large-scale national evaluations of social programs in the USA had its origins in the War of Poverty, and the way of monitoring programs is still evolving rapidly. Evaluation has proven its usefulness to the extent that most federal and state agencies require evaluations of new programs.

Programs are the objects of evaluation, sometimes also called projects or interventions. Program evaluation derives from the idea that social programs should have demonstrable merits and explicit aims by which success or failure may be empirically judged; for example, literacy programs for high-school students should lead to measurable improvements in their reading skills. Introductory texts on evaluation describe programs as systematic activities that are provided on a continuing basis to achieve preplanned purposes; for example, a national program to train teachers to work in rural and underserved areas, a campaign to vaccinate all children in a school district (e.g., Fink 1995, JCSEE 1994).

Evaluation systematically investigates characteristics and merits of programs. It is the purpose of evaluation to provide information on the eﬀectiveness of programs so as to optimize outcomes, quality, and eﬃciency. Program evaluations are conducted by applying many of the same methods that social researchers rely on to gather reliable and valid evidence. These include formulating the study questions, designing the evaluation, setting standards to assess the program impact, ongoing monitoring of the program functioning, collecting and analyzing data, assessing program eﬀectiveness relative to the costs, and reporting results. For examples of evaluation in various ﬁelds of clinical and applied psychology see, for example, Bickman et al. (1995), Gueron and Pauly (1991), Hope and Foster (1992), and Spiel (2001).

While few authors (e.g., Scriven 1991) take the view that theories are a luxury for the evaluator, since they are not even essential for explanations, the majority of researchers called for injections of theory into every aspect of evaluation design and analysis (e.g., Pawson and Tilley 1997, Berk and Rossi 1998). This implies that program evaluation capitalizes on existing theory and empirical generalizations from the social sciences, and is, therefore, after all, applied research.

By relying on the scientiﬁc method, program evaluation diﬀers from e.g., newspaper reporting and social commentary. Program evaluation diﬀers from the social sciences in its goals and audience.

Evaluation goals are deﬁned by the client; that is the individual, group, or organization that commissions the evaluator(s), for example, policymakers. However, often programs and clients do not have clear and consistent goals that can be evaluated for eﬀectiveness. Therefore, to meet evaluation standards, program evaluation should be conducted by professional evaluators who are experts in social sciences, using a broad repertoire of concepts and methods, but who have also expertise in project and staﬀ management, and have at least basic knowledge of the evaluated domain.

While impetus for evaluation rests in policy, information from evaluations is used by the stakeholders, all individuals or groups that may be involved in or aﬀected by a program evaluation. There are at least ﬁve groups (Fink 1995): (a) federal, state, and local government, for example, a city, the National Institute of Education; (b) program providers, for example, the healthcare services, the curriculum department in a school; (c) policymakers, for example, a subcommittee on human services of the state; (d) program funders, for example, the Oﬃce of Education; (e) social researchers, for example, in government, public agencies, and universities. Therefore, program evaluation is also an interdisciplinary task where evaluators and stakeholders work together (see Cousins and Earl 1992).

1.2 Key Concepts And Assignment Criteria

Evaluation can and should be done in all phases of a program. Baseline data collected before the start of the program are used to describe the current situation; for example, the need for an intervention. In addition, baseline data are used to monitor and explain changes. For example, if the medical emergency services attempt to reorganize their service to better care for children, they may collect interim data to answer questions such as: In what cases did the emergency medical services fail to meet the needs of critically ill children? What are the diﬀerences to the needs of adults? What is the current performance of medical staﬀ?

Before the start of the program a prospective evaluation can be employed to determine the program’s potential of realisation, its eﬀectiveness, and impact; that is, the scope of its eﬀects.

In formative evaluation, interim data are collected after the start of a program but before its conclusion. It is the purpose of formative evaluation to describe the progress of the program and, if necessary, to modify and optimize the program design.

A process evaluation is concerned with the extent to which planned activities are executed and, therefore, is nearly always useful. For example, process evaluation can accompany implementation of a communityoriented curriculum in medical schools.

Outcome evaluation deals with the question of whether programs achieve their goals. However, as mentioned before, goals are often stated vaguely and broadly and, therefore, program eﬀectiveness is diﬃcult to evaluate. In practice, the concept of eﬀectiveness must always address the issue ‘compared to what?’ Often it is diﬃcult to distinguish program eﬀects from chance variation and from other forces aﬀecting the outcome. Berk and Rossi (1998) distinguish between marginal eﬀectiveness, relative eﬀectiveness, and cost eﬀectiveness. Marginal eﬀectiveness concerns the dosage of an intervention measure. For example, to prove the assumption that a better student–teacher ratio would improve student performance, the class size has to be reduced. In this case, two or more diﬀerent student–teacher ratios are the doses of intervention and their outcomes are compared.

Relative eﬀectiveness is evaluated by contrasting two and more programs, or a program and the absence of the program. For example, lecture-based learning can be contrasted with problem-based learning in their eﬀects on students’ practical performance. Cost-eﬀectiveness considers eﬀectiveness in monetary units. Evaluation usually is concerned with the costs of programs and their relationship to eﬀects and beneﬁts. For example, in the evaluation of a physical education program, one can ask how much gain in monetary units can be expected from the reduced use of health services.

However, the entire impact of a program, for example, the extent of its inﬂuence in other settings, is very diﬃcult, maybe impossible to assess. Therefore, summative evaluations are applied, that is, historical reviews of programs that are performed after the programs have been in operation for some period of time.

2. What Standards Can Be Set To Enhance Quality And Fairness Of Evaluation?

Because of the increasing number of evaluations and the high impact that decisions based on results of evaluations may have, there has been since the early 1970s an ongoing discussion about guidelines for eﬀective evaluations. In the absence of any clear deﬁnition of what constitutes a reasonable program evaluation, the Joint Committee on Standards for Educational Evaluation (JCSEE 1994), in cooperation with evaluation and research specialists, compiled knowledge about program evaluation and established a systematic progress for developing, testing, and publishing evaluation standards. The Joint Committee began its work in 1975. In 1989 the progress for developing standards was adopted by the American National Standards Institute (ANSI), and became available worldwide.

Originally, the standards were developed to guide the design, implementation, and assessment of evaluations of educational programs, projects, and materials. However, in the new Program Evaluation Standards, published in 1994, new illustrations were included that feature applications of the standards in such other settings as medicine, nursing, the military, business, law, government, and social service agencies (JCSEE 1994). The standards are published for people who commission and conduct evaluations as well as for people who use the results of evaluations.

The Joint Committee deﬁned an evaluation standard as a principle mutually agreed on by people engaged in the professional practice of evaluation, that, if met, will enhance the quality and fairness of an evaluation (JCSEE 1994).The standards provide advice on how to judge the adequacy of evaluation activities, but they do not present speciﬁc criteria for such judgments. The reason is that there is no such thing as a routine evaluation and the standards encourage the use of a variety of evaluation methods.

The 30 standards are organized around the four important attributes of sound and fair program evaluation: utility, feasibility, propriety, and accuracy (JCSEE 1994).

Utility Standards guide evaluations so that they will be informative, timely, and inﬂuential. They require evaluators to keep in any phase of the evaluation the stakeholders’ needs in mind. Seven standards are included in this category; for example, Stakeholder Identiﬁcation, Evaluator Credibility, Report Clarity, and Evaluation Impact.

Feasibility Standards call for evaluations to be realistic, prudent, diplomatic, and economical. They recognize that evaluations usually are conducted in natural settings. The three standards included in this category are Practical Procedures, Political Viability, and Cost Eﬀectiveness.

Propriety Standards are intended to facilitate protection of the rights of individuals aﬀected by an evaluation. They urge evaluators to act lawfully, scrupulously, and ethically. In this category eight standards are included; for example, Service Orientation, Formal Agreements, Disclosure of Findings, Fiscal Responsibility.

Accuracy Standards are intended to ensure that an evaluation will reveal and transmit accurate information about the program’s merits. They determine whether an evaluation has produced sound information. The 12 accuracy standards include Program Documentation, Valid Information, Reliable Information, Justiﬁed Conclusions, and Metaevaluation.

Validity refers to the degree to which an information or measure assesses what it claims to assess, while reliability refers to the degree to which an information is free from ‘measurement error’; that is, the exactness of the information. Meta-evaluation is evaluation of an evaluation.

For each standard the Joint Committee summarized guidelines for application and common errors. In addition, one or more illustrations of the standard’s application are given (JCSEE 1994).

However, the Joint Committee acknowledges that standards are not all equally applicable in every evaluation. Professional evaluators must select to identify those that are most applicable in a particular context.

3. How To Conduct A Program Evaluation?

Program evaluation typically consists of a chronological sequence of activities or tasks. The main tasks or activies are (see, e.g., Fink 1995, JCSEE 1994): Posing questions about the program and deﬁning evaluation goals, setting standards of eﬀectiveness, designing the evaluation, collecting and analyzing information, reporting the results. In addition, an evaluation plan must be worked out that speciﬁes each of the tasks to be accomplished and the personnel, time, and resources needed.

3.1 Posing Evaluation Questions And Setting Standards Of Eﬀectiveness

Based on the program’s goals and objectives, evaluation questions are formulated. While goals are often relatively general, e.g., to improve the organization of an institution, objectives refer to the speciﬁc purposes of a program; for example, to produce a guidebook about how to review the literature. Evaluation questions may also concern the magnitude, duration, and distribution of eﬀects.

Because program evaluations should provide convincing evidence concerning the eﬀectiveness of a program, speciﬁc eﬀectiveness criteria are set. Because these standards of eﬀectiveness determine the evaluation design, they should be set prior to any evaluation activity. Information about standards came from the literature, from experts, normative data, and statistical analyses (see, e.g., Donabedian 1982).

Consider, for example, the program goal to teach students how to review the literature (see Fink 1995). Then the evaluation question is, whether students learned to do this. The standard of eﬀectiveness can set, for example, (a) that 90 percent of students learn to review the literature or, (b) that a statistically signiﬁcant diﬀerence in learning is observed between students in Schools A and B, with students in School A participating in a new program and students in School B serving as controls. The decision for either standard (a) or (b) can have consequences for the evaluation design, the selection of participants, and the statistical analyses of the data.

3.2 Designing The Evaluation

An evaluation design is a structure created to produce an unbiased appraisal of a program’s beneﬁts. The decision for an evaluation design depends on the evaluation questions and the standards of eﬀectiveness, but also on the resources available and on the degree of precision needed. Given the variety of research designs there is no single ‘best way’ to proceed (for an overview see, e.g., Cook and Campbell 1979, von Eye and Spiel 1996).

Essential criteria for evaluation designs are internal and external validity. Internal validity is attained when the evaluator can decide whether a ﬁnding is due to the program and cannot be caused by some other factors or biases. Randomized experiments are the method of choice to reach this criterion; for example, when participants are by chance assigned either to program A, to program B, or to no program. External validity concerns the generalizability of ﬁndings; for example, to participants of the program in other places and at other times.

3.3 Collecting Information

For collecting information an arsenal of methods is available; for example, questionnaires, performance tests, various kinds of interviews, record reviews, observations, etc. (see, e.g., Kosecoﬀ and Fink 1982, Rosenbaum 1995). When selecting the methods of data collection, the evaluator must consider the evaluation questions, the available technical and ﬁnancial resources, and the time needed to develop a new instrument. A ‘good’ measure is, in common terms, one that is likely to measure accurately what it is supposed to measure (Berk and Rossi 1998). In methodological terms, a good measure is reliable and valid.

Mostly, more than one measure is used to obtain the relevant information. For example, when health-related Behavior of 14-year-old students is the variable of interest, possible methods of data collection include surveys of students, surveys of parents and teachers, observations of students, and reviews of health and medical records (Fink 1995).

3.4 Analyzing And Reporting Data

Program evaluators and social scientists employ statistical methods to analyze and summarize data. However, evaluators have to keep in mind that statistical signiﬁcance does not always imply practical signiﬁcance or educational meaningfulness. Standards of eﬀectiveness that have been set in advance of the evaluation (see above) should guarantee that analyses of the data provide also evidence about practical eﬀectiveness of the program (see, e.g., Fink 1995).

To combine results of evaluations that address the same questions or goals, meta-analyses can be conducted. A meta-analysis is an analysis of evaluation data from several studies. It is a statistical procedure which attempts to provide estimates of program impact by combining the results from a number of distinct evaluations to increase statistical power and the generalizability of results (see, e.g., Rosenthal 1991).

Reporting results is the last task for the evaluator. The evaluation report answers the evaluation questions, describes how the answers were obtained, and translates the ﬁndings into conclusions and recommendations about the program.

4. Concluding Remarks

The demands for sound program evaluation continue to grow in a way that some authors (Pawson and Tilley 1997) fear may become mandatory that everything needs evaluating. Therefore, it is essential that quality control procedures and standards be introduced for all facets and tasks of evaluation and that the evaluation itself is evaluated against pertinent standards. However, there is also the risk of oversimplifying and only using a checklist of standards. There is no ﬁxed recipe of how to run a successful evaluation.

Bibliography:

Berk R A, Rossi P H 1998 Thinking about Program Evaluation. Sage, Thousand Oaks, CA
Bickman L, Guthrie A R, Foster M, Lambert E W, Summerfelt W T, Breda C, Heﬂinger C 1995 Managed Care in Mental Health: The Fort Bragg Experiment. Plenum, New York
Cook T, Campbell D 1979 Quasi-experimentation. McGrawHill, New York
Cousins J B, Earl L M 1992 The case for participatory evaluation. Educational Evaluation and Policy Analysis 14: 397–418
Donabedian A 1982 The Deﬁnition of Quality and Approaches to its Assessment. Health Administration Press, Ann Arbor, MI
Fink A 1995 Evaluation for Education & Psychology. Sage, Thousand Oaks, CA
Gueron J, Pauly E 1991 From Welfare to Work. Sage, New York
Hope T, Foster J 1992 Conﬂicting forces: Changing the dynamics of crime and community on ‘problem’ estates. British Journal of Criminology 32: 488–504
Joint Committee on Standards for Educational Evaluation (JCSEE) 1994 The Program Evaluation Standards, 2nd edn. Sage, Thousand Oaks, CA
Kosecoﬀ J, Fink A 1982 Evaluation Basics. Sage, Newbury Park, CA
Pawson R, Tilley N 1997 Realistic Evaluation. Sage, London
Rosenbaum P R 1995 Observational Studies. Springer-Verlag, New York
Rosenthal R 1991 Meta-analysis: A review. Psychosomatic Medicine 53: 247–71
Rossi P H, Freeman H E 1993 Evaluation. A systematic approach, 5th edn. Sage, Newbury Park, CA
Scriven M 1991 Evaluation Thesaurus, 4th edn. Sage, Newbury Park, CA
Spiel C (ed.) 2001 Evaluation universitarer Lehre—zwischen Qualitatsmanagement und Selbstzweck [Evaluating University Teaching—Between Quality Management and End in Itself ]. Waxmann, Munster, Germany
von Eye A, Spiel C 1996 Research methodology: Human development. In: Tuijman A C (ed.) International Encyclopedia of Adult Education and Training. Concepts, Theories, and Methods, 2nd edn. Sect. 1. Pergamon Press, New York