Program Evaluation Research Paper

Custom Writing Services

Sample Program Evaluation Research Paper. Browse other  research paper examples and check the list of research paper topics for more inspiration. If you need a religion research paper written according to all the academic standards, you can always turn to our experienced writers for help. This is how your paper can get an A! Feel free to contact our research paper writing service for professional assistance. We offer high-quality assignments for reasonable rates.

Evaluations are, in a broad sense, concerned with the effectiveness of programs. While commonsense evaluation has a very long history, evaluation research which relies on scientific methods is a young discipline but has grown massively in recent years. Program evaluation has made important contributions to various social domains; for example, to education, health, economics, and the environment.



This research paper gives an introduction to program evaluation by dealing with the following three questions: What is program evaluation? What standards can be set to enhance quality and fairness of evaluation? How to conduct a program evaluation?

1. What Is Program Evaluation?

1.1 Definition, Distinction From Other Activities, Users, And Evaluators

Evaluation, broadly construed, has a very long history. For example, in ancient Rome, tax policies were altered in response to observed fluctuations in revenues, and many thousands of years ago, hunters developed complicated procedures for fishing and trapping game. In recent years, commensense program evaluation has evolved into evaluation research which relies on scientific research methods. The paper focuses on the latter type of evaluation (for introductory texts on evaluation research see, e.g., Berk and Rossi 1998, Fink 1995, Pawson and Tilley 1997, Rossi and Freeman 1993). In this research paper, the terms evaluation, program evaluation, and evaluation research are used synonymously. Concerns for the first large-scale national evaluations of social programs in the USA had its origins in the War of Poverty, and the way of monitoring programs is still evolving rapidly. Evaluation has proven its usefulness to the extent that most federal and state agencies require evaluations of new programs.

Programs are the objects of evaluation, sometimes also called projects or interventions. Program evaluation derives from the idea that social programs should have demonstrable merits and explicit aims by which success or failure may be empirically judged; for example, literacy programs for high-school students should lead to measurable improvements in their reading skills. Introductory texts on evaluation describe programs as systematic activities that are provided on a continuing basis to achieve preplanned purposes; for example, a national program to train teachers to work in rural and underserved areas, a campaign to vaccinate all children in a school district (e.g., Fink 1995, JCSEE 1994).

Evaluation systematically investigates characteristics and merits of programs. It is the purpose of evaluation to provide information on the effectiveness of programs so as to optimize outcomes, quality, and efficiency. Program evaluations are conducted by applying many of the same methods that social researchers rely on to gather reliable and valid evidence. These include formulating the study questions, designing the evaluation, setting standards to assess the program impact, ongoing monitoring of the program functioning, collecting and analyzing data, assessing program effectiveness relative to the costs, and reporting results. For examples of evaluation in various fields of clinical and applied psychology see, for example, Bickman et al. (1995), Gueron and Pauly (1991), Hope and Foster (1992), and Spiel (2001).

While few authors (e.g., Scriven 1991) take the view that theories are a luxury for the evaluator, since they are not even essential for explanations, the majority of researchers called for injections of theory into every aspect of evaluation design and analysis (e.g., Pawson and Tilley 1997, Berk and Rossi 1998). This implies that program evaluation capitalizes on existing theory and empirical generalizations from the social sciences, and is, therefore, after all, applied research.

By relying on the scientific method, program evaluation differs from e.g., newspaper reporting and social commentary. Program evaluation differs from the social sciences in its goals and audience.

Evaluation goals are defined by the client; that is the individual, group, or organization that commissions the evaluator(s), for example, policymakers. However, often programs and clients do not have clear and consistent goals that can be evaluated for effectiveness. Therefore, to meet evaluation standards, program evaluation should be conducted by professional evaluators who are experts in social sciences, using a broad repertoire of concepts and methods, but who have also expertise in project and staff management, and have at least basic knowledge of the evaluated domain.

While impetus for evaluation rests in policy, information from evaluations is used by the stakeholders, all individuals or groups that may be involved in or affected by a program evaluation. There are at least five groups (Fink 1995): (a) federal, state, and local government, for example, a city, the National Institute of Education; (b) program providers, for example, the healthcare services, the curriculum department in a school; (c) policymakers, for example, a subcommittee on human services of the state; (d) program funders, for example, the Office of Education; (e) social researchers, for example, in government, public agencies, and universities. Therefore, program evaluation is also an interdisciplinary task where evaluators and stakeholders work together (see Cousins and Earl 1992).

1.2 Key Concepts And Assignment Criteria

Evaluation can and should be done in all phases of a program. Baseline data collected before the start of the program are used to describe the current situation; for example, the need for an intervention. In addition, baseline data are used to monitor and explain changes. For example, if the medical emergency services attempt to reorganize their service to better care for children, they may collect interim data to answer questions such as: In what cases did the emergency medical services fail to meet the needs of critically ill children? What are the differences to the needs of adults? What is the current performance of medical staff?

Before the start of the program a prospective evaluation can be employed to determine the program’s potential of realisation, its effectiveness, and impact; that is, the scope of its effects.

In formative evaluation, interim data are collected after the start of a program but before its conclusion. It is the purpose of formative evaluation to describe the progress of the program and, if necessary, to modify and optimize the program design.

A process evaluation is concerned with the extent to which planned activities are executed and, therefore, is nearly always useful. For example, process evaluation can accompany implementation of a community oriented curriculum in medical schools.

Outcome evaluation deals with the question of whether programs achieve their goals. However, as mentioned before, goals are often stated vaguely and broadly and, therefore, program effectiveness is difficult to evaluate. In practice, the concept of effectiveness must always address the issue ‘compared to what?’ Often it is difficult to distinguish program effects from chance variation and from other forces affecting the outcome. Berk and Rossi (1998) distinguish between marginal effectiveness, relative effectiveness, and cost effectiveness. Marginal effectiveness concerns the dosage of an intervention measure. For example, to prove the assumption that a better student–teacher ratio would improve student performance, the class size has to be reduced. In this case, two or more different student–teacher ratios are the doses of intervention and their outcomes are compared.

Relative effectiveness is evaluated by contrasting two and more programs, or a program and the absence of the program. For example, lecture-based learning can be contrasted with problem-based learning in their effects on students’ practical performance. Cost-effectiveness considers effectiveness in monetary units. Evaluation usually is concerned with the costs of programs and their relationship to effects and benefits. For example, in the evaluation of a physical education program, one can ask how much gain in monetary units can be expected from the reduced use of health services.

However, the entire impact of a program, for example, the extent of its influence in other settings, is very difficult, maybe impossible to assess. Therefore, summative evaluations are applied, that is, historical reviews of programs that are performed after the programs have been in operation for some period of time.

2. What Standards Can Be Set To Enhance Quality And Fairness Of Evaluation?

Because of the increasing number of evaluations and the high impact that decisions based on results of evaluations may have, there has been since the early 1970s an ongoing discussion about guidelines for effective evaluations. In the absence of any clear definition of what constitutes a reasonable program evaluation, the Joint Committee on Standards for Educational Evaluation (JCSEE 1994), in cooperation with evaluation and research specialists, compiled knowledge about program evaluation and established a systematic progress for developing, testing, and publishing evaluation standards. The Joint Committee began its work in 1975. In 1989 the progress for developing standards was adopted by the American National Standards Institute (ANSI), and became available worldwide.

Originally, the standards were developed to guide the design, implementation, and assessment of evaluations of educational programs, projects, and materials. However, in the new Program Evaluation Standards, published in 1994, new illustrations were included that feature applications of the standards in such other settings as medicine, nursing, the military, business, law, government, and social service agencies (JCSEE 1994). The standards are published for people who commission and conduct evaluations as well as for people who use the results of evaluations.

The Joint Committee defined an evaluation standard as a principle mutually agreed on by people engaged in the professional practice of evaluation, that, if met, will enhance the quality and fairness of an evaluation (JCSEE 1994).The standards provide advice on how to judge the adequacy of evaluation activities, but they do not present specific criteria for such judgments. The reason is that there is no such thing as a routine evaluation and the standards encourage the use of a variety of evaluation methods.

The 30 standards are organized around the four important attributes of sound and fair program evaluation: utility, feasibility, propriety, and accuracy (JCSEE 1994).

Utility Standards guide evaluations so that they will be informative, timely, and influential. They require evaluators to keep in any phase of the evaluation the stakeholders’ needs in mind. Seven standards are included in this category; for example, Stakeholder Identification, Evaluator Credibility, Report Clarity, and Evaluation Impact.

Feasibility Standards call for evaluations to be realistic, prudent, diplomatic, and economical. They recognize that evaluations usually are conducted in natural settings. The three standards included in this category are Practical Procedures, Political Viability, and Cost Effectiveness.

Propriety Standards are intended to facilitate protection of the rights of individuals affected by an evaluation. They urge evaluators to act lawfully, scrupulously, and ethically. In this category eight standards are included; for example, Service Orientation, Formal Agreements, Disclosure of Findings, Fiscal Responsibility.

Accuracy Standards are intended to ensure that an evaluation will reveal and transmit accurate information about the program’s merits. They determine whether an evaluation has produced sound information. The 12 accuracy standards include Program Documentation, Valid Information, Reliable Information, Justified Conclusions, and Metaevaluation.

Validity refers to the degree to which an information or measure assesses what it claims to assess, while reliability refers to the degree to which an information is free from ‘measurement error’; that is, the exactness of the information. Meta-evaluation is evaluation of an evaluation.

For each standard the Joint Committee summarized guidelines for application and common errors. In addition, one or more illustrations of the standard’s application are given (JCSEE 1994).

However, the Joint Committee acknowledges that standards are not all equally applicable in every evaluation. Professional evaluators must select to identify those that are most applicable in a particular context.

3. How To Conduct A Program Evaluation?

Program evaluation typically consists of a chronological sequence of activities or tasks. The main tasks or activies are (see, e.g., Fink 1995, JCSEE 1994): Posing questions about the program and defining evaluation goals, setting standards of effectiveness, designing the evaluation, collecting and analyzing information, reporting the results. In addition, an evaluation plan must be worked out that specifies each of the tasks to be accomplished and the personnel, time, and resources needed.

3.1 Posing Evaluation Questions And Setting Standards Of Effectiveness

Based on the program’s goals and objectives, evaluation questions are formulated. While goals are often relatively general, e.g., to improve the organization of an institution, objectives refer to the specific purposes of a program; for example, to produce a guidebook about how to review the literature. Evaluation questions may also concern the magnitude, duration, and distribution of effects.

Because program evaluations should provide convincing evidence concerning the effectiveness of a program, specific effectiveness criteria are set. Because these standards of effectiveness determine the evaluation design, they should be set prior to any evaluation activity. Information about standards came from the literature, from experts, normative data, and statistical analyses (see, e.g., Donabedian 1982).

Consider, for example, the program goal to teach students how to review the literature (see Fink 1995). Then the evaluation question is, whether students learned to do this. The standard of effectiveness can set, for example, (a) that 90 percent of students learn to review the literature or, (b) that a statistically significant difference in learning is observed between students in Schools A and B, with students in School A participating in a new program and students in School B serving as controls. The decision for either standard (a) or (b) can have consequences for the evaluation design, the selection of participants, and the statistical analyses of the data.

3.2 Designing The Evaluation

An evaluation design is a structure created to produce an unbiased appraisal of a program’s benefits. The decision for an evaluation design depends on the evaluation questions and the standards of effectiveness, but also on the resources available and on the degree of precision needed. Given the variety of research designs there is no single ‘best way’ to proceed (for an overview see, e.g., Cook and Campbell 1979, von Eye and Spiel 1996).

Essential criteria for evaluation designs are internal and external validity. Internal validity is attained when the evaluator can decide whether a finding is due to the program and cannot be caused by some other factors or biases. Randomized experiments are the method of choice to reach this criterion; for example, when participants are by chance assigned either to program A, to program B, or to no program. External validity concerns the generalizability of findings; for example, to participants of the program in other places and at other times.

3.3 Collecting Information

For collecting information an arsenal of methods is available; for example, questionnaires, performance tests, various kinds of interviews, record reviews, observations, etc. (see, e.g., Kosecoff and Fink 1982, Rosenbaum 1995). When selecting the methods of data collection, the evaluator must consider the evaluation questions, the available technical and financial resources, and the time needed to develop a new instrument. A ‘good’ measure is, in common terms, one that is likely to measure accurately what it is supposed to measure (Berk and Rossi 1998). In methodological terms, a good measure is reliable and valid.

Mostly, more than one measure is used to obtain the relevant information. For example, when health related behavior of 14-year-old students is the variable of interest, possible methods of data collection include surveys of students, surveys of parents and teachers, observations of students, and reviews of health and medical records (Fink 1995).

3.4 Analyzing And Reporting Data

Program evaluators and social scientists employ statistical methods to analyze and summarize data. However, evaluators have to keep in mind that statistical significance does not always imply practical significance or educational meaningfulness. Standards of effectiveness that have been set in advance of the evaluation (see above) should guarantee that analyses of the data provide also evidence about practical effectiveness of the program (see, e.g., Fink 1995).

To combine results of evaluations that address the same questions or goals, meta-analyses can be conducted. A meta-analysis is an analysis of evaluation data from several studies. It is a statistical procedure which attempts to provide estimates of program impact by combining the results from a number of distinct evaluations to increase statistical power and the generalizability of results (see, e.g., Rosenthal 1991).

Reporting results is the last task for the evaluator. The evaluation report answers the evaluation questions, describes how the answers were obtained, and translates the findings into conclusions and recommendations about the program.

4. Concluding Remarks

The demands for sound program evaluation continue to grow in a way that some authors (Pawson and Tilley 1997) fear may become mandatory that everything needs evaluating. Therefore, it is essential that quality control procedures and standards be introduced for all facets and tasks of evaluation and that the evaluation itself is evaluated against pertinent standards. However, there is also the risk of oversimplifying and only using a checklist of standards. There is no fixed recipe of how to run a successful evaluation.


  1. Berk R A, Rossi P H 1998 Thinking about Program Evaluation. Sage, Thousand Oaks, CA
  2. Bickman L, Guthrie A R, Foster M, Lambert E W, Summerfelt W T, Breda C, Heflinger C 1995 Managed Care in Mental Health: The Fort Bragg Experiment. Plenum, New York
  3. Cook T, Campbell D 1979 Quasi-experimentation. McGrawHill, New York
  4. Cousins J B, Earl L M 1992 The case for participatory evaluation. Educational Evaluation and Policy Analysis 14: 397–418
  5. Donabedian A 1982 The Definition of Quality and Approaches to its Assessment. Health Administration Press, Ann Arbor, MI
  6. Fink A 1995 Evaluation for Education & Psychology. Sage, Thousand Oaks, CA
  7. Gueron J, Pauly E 1991 From Welfare to Work. Sage, New York
  8. Hope T, Foster J 1992 Conflicting forces: Changing the dynamics of crime and community on ‘problem’ estates. British Journal of Criminology 32: 488–504
  9. Joint Committee on Standards for Educational Evaluation (JCSEE) 1994 The Program Evaluation Standards, 2nd edn. Sage, Thousand Oaks, CA
  10. Kosecoff J, Fink A 1982 Evaluation Basics. Sage, Newbury Park, CA
  11. Pawson R, Tilley N 1997 Realistic Evaluation. Sage, London
  12. Rosenbaum P R 1995 Observational Studies. Springer-Verlag, New York
  13. Rosenthal R 1991 Meta-analysis: A review. Psychosomatic Medicine 53: 247–71
  14. Rossi P H, Freeman H E 1993 Evaluation. A systematic approach, 5th edn. Sage, Newbury Park, CA
  15. Scriven M 1991 Evaluation Thesaurus, 4th edn. Sage, Newbury Park, CA
  16. Spiel C (ed.) 2001 Evaluation universitarer Lehre—zwischen Qualitatsmanagement und Selbstzweck [Evaluating University Teaching—Between Quality Management and End in Itself ]. Waxmann, Munster, Germany
  17. von Eye A, Spiel C 1996 Research methodology: Human development. In: Tuijman A C (ed.) International Encyclopedia of Adult Education and Training. Concepts, Theories, and Methods, 2nd edn. Sect. 1. Pergamon Press, New York
History Of Progress Research Paper
Professionalization of Social Scientists Research Paper


Always on-time


100% Confidentiality
Special offer! Get discount 10% for the first order. Promo code: cd1a428655