View sample cancer research paper on cancer screening. Browse other research paper examples for more inspiration. If you need a thorough research paper written according to all the academic standards, you can always turn to our experienced writers for help. This is how your paper can get an A! Feel free to contact our writing service for professional assistance. We offer high-quality assignments for reasonable rates.
Theory Of Screening
The primary purpose of screening for cancer is to reduce mortality. Besides the effect on the length of life, screening also has other important consequences, including a burden on economic resources and implications for the quality of life. Screening usually implies an increase in health expenditure. The effects on the quality of life of screened subjects can be both positive and negative.
Cancers Suitable For Screening
Cancer is characterized both by an insidious onset and by an improved outcome when it is detected early. For a cancer to be suitable for screening, therefore, the natural history of the disease should include a phase without symptoms during which the cancer can be detected by a screening test earlier than by ordinary clinical diagnosis after the subject has experienced symptoms. The outcome of treatment following diagnosis during this detectable, preclinical phase (DPCP) (Cole and Morrison, 1978) should also be better than following clinical detection. Sometimes the screening program may reduce morbidity or improve the quality of life. For example, a mammography program may increase the number of women undergoing surgery because of overdiagnosis; early diagnosis increases the duration of sickness, but mammography enables breast-conserving operations, which improves the quality of life and produces less morbidity than the treatment of more advanced disease.
The objective of screening is to reduce the burden of disease. The impact of disease covers both death and morbidity, that is, detriment of the disease for the patient while alive, including reduced well-being and loss of functional capacity. The prognosis should be better following early treatment of screen-detected disease than of clinically detected disease. If a disease can be successfully treated after it manifests clinically, there is no need for screening. Screening should not be applied for untreatable diseases. Cancer is always a potentially lethal disease and the primary goal of treatment is saving the patient’s life. Thus, a reduction of mortality is the most important indicator of the effectiveness of screening.
The disease should be common enough to justify the efforts involved in mounting a screening program. Disease in the preclinical phase is the target of detection. The frequency of preclinical disease in the population to be screened depends on the incidence of clinical disease and on the length of the detectable preclinical phase. Screening for disease is a continuous process. The prevalence of disease at initial screening may be substantially different from the prevalence at subsequent screens. This variation in yield is not due to the length of the preclinical phase, that is, the basic biological properties of the disease only, but is simply a consequence of the length of intervals between screening rounds, or the screening regimen.
The screening test must be capable of identifying disease in the preclinical phase. For cancer, the earliest stages of the preclinical phase are thought to occur in a single cell, and they remain beyond identification. The detectable preclinical phase starts when the disease becomes detectable by the test and ends when the cancer would surface clinically. The length of the DPCP is called the sojourn time (Day and Walter, 1984). The sojourn time is not constant, but varies from case to case. It can be described with a theoretical distribution. Factors influencing the sojourn time include (1) the biological characteristics of the disease (growth and progression rate) influencing the time from start of detectability to diagnosis at symptomatic stage, (2) the screening test and the cut-off value or criteria for positivity, used for identifying early disease (detectability threshold), and (3) behavioral factors and health care affecting the end of DPCP, or the time of diagnostic confirmation of the disease.
The occurrence of disease in a population is measured in terms of incidence or mortality rates. Stomach cancer is still common in middle-aged populations in many countries, but the risk is rapidly decreasing, which contrasts with a substantial increase for cancer of the breast, another disease occurring at a relatively early age in many countries. In contrast, the incidence of prostate cancer has been very low under 65 years of age, and the increase in incidence is mainly attributable to diagnostic activity. Therefore, the number of life-years that can potentially be saved by a (successful) screening program is small.
If screening postpones death, screening increases the prevalence of disease, that is, the number of people living who have been diagnosed with it at some time in the past. Prevalence may also increase as a consequence of earlier diagnosis or overtreatment, so the prevalence of disease and the survival of cancer patients are not appropriate indicators of the effect of screening.
Effects of screening on the quality of life and cost are in principle considered in decisions on whether or not to screen. In practice, however, public health policies related to cancer screening are invariably initiated, run, and evaluated by their effect on mortality. In fact, there is no agreement on how to apply criteria other than mortality to policy decisions about screening. Such a decision would require evidence on the magnitude of effect on these criteria, and agreement on how to weigh the benefits and harms in different dimensions of death, quality of life, and cost.
Benefits And Harms Of Screening
Independently of effectiveness, screening has adverse effects. The effectiveness of screening is related to the degree to which the objectives are met. As indicated above, the purpose of screening is to reduce disease burden and the main goal is the reduction of deaths from the disease. If the treatment of screen-detected disease has fewer or less serious adverse effects than the treatment of clinical disease, then the iatrogenic morbidity for the patient is decreased. If late-stage disease is avoided, highly debilitating effects can be reduced. Costs can be saved if treatment and follow-up of early disease requires fewer resources than clinically detected cancer. A correct negative test also has a beneficial effect in terms of reassurance for those without disease.
Because screening requires preclinical diagnosis of disease, the period of morbidity is prolonged by the interval between diagnosis at screening and the hypothetical time when clinical diagnosis would have occurred if the patient had not been screened. This interval is called the lead time.
Screen-positive cases are confirmed by the standard clinical diagnostic methods. Many screen-detected cases are borderline abnormalities, some of which would progress to clinical disease, while some would not progress even if untreated. Cervical cancer screening results in detection of early intraepithelial neoplasia (premalignant lesions or in situ carcinomas), not all of which would progress to (fatal) cancer, even without treatment (IARC, 2005). Occult intraductal carcinomas of the breast (IARC, 2002), occult papillary carcinomas of the thyroid gland (Furihata and Maruchi, 1969), or occult prostatic cancer (Hugosson et al., 2000) may fulfill the histological criteria for malignancy, but would remain indolent clinically. Any screening program will disclose such abnormalities, which are indistinguishable from a case that would progress into clinical disease during the person’s lifetime, if not subjected to early treatment. Therefore, one of the adverse effects of screening is overdiagnosis, that is, detection of indolent disease and its unnecessary treatment (overtreatment), which results in anxiety and morbidity that would be avoided without screening and which is unnecessary for achieving the goals of screening.
A false-negative screening test (a person with disease that is not detected by the test) provides undue reassurance and may result in delayed diagnosis and worse outcome of treatment. In such cases, the effect of screening is disadvantageous.
Screening tests are applied to a population without recognized disease. In addition to the abnormal or borderline diagnoses, therefore, there will be false-positive screening results (a person without disease has a positive test), and these can cause anxiety and morbidity. The test itself may carry a risk. For example, screening for breast cancer is based on mammography, which involves a small radiation dose. The small risk of breast cancer induced by irradiating a large population should be compared with the benefits of mammography.
Validity Of Screening
The validity of screening indicates the process or performance of screening and consists of two components: sensitivity and specificity. Sensitivity is an indicator of the extent to which preclinical disease is identified and specificity describes the extent to which healthy individuals are so identified. Predictive values are derived from those indicators and they describe the performance from the point of view of the person screened (the screenee).
The purpose of the screening test is to distinguish a subset of the population with a high probability of having unrecognized disease from the rest of the population, with average or low risk. Sensitivity is the proportion of persons who have a positive test among those with the disease in the DPCP (Table 1). Sensitivity is a basic performance measure because it indicates the proportion of early disease that is identified by screening. Specificity is the proportion of persons with a negative test among all those screened who are disease-free. Specificity is a basic measure of the disadvantages of a test, since poor specificity results in high financial costs and adverse effects due to false-positive tests. Both the sensitivity and the specificity of a screening test are process indicators. A screening program based on a valid test may nevertheless fail in its objective of reducing mortality in the screened population.
As with clinical diagnostic procedures, the screening test is not always unambiguously positive or negative, and classification depends on the subjective judgment of the individual who is interpreting the test. The test result is often quantitative or semi-quantitative rather than dichotomous (positive or negative). Serum prostate-specific antigen (PSA) concentration is measured on a continuous scale and the results of Pap smear for cervical cancer were originally given on a 5-point scale of increasing degree of suspected malignancy (I, normal; II, benign infection; III, suspicious lesion; IV, probably malignant; V, malignant). The repeatability and validity of any classification are imperfect, because of both subjective interpretation and other sources of variation in the screening test. Because of this ambiguity, the cut-off point on the scale classifying the population in terms of screening positives and negatives can be selected at varying levels. The selection of the cutoff point crucially affects the test performance. Definition of a cut-off level influences both sensitivity and specificity in an opposite, counterbalancing fashion: a gain in sensitivity is inevitably accompanied by a loss in specificity.
Selection of a particular cut-off point will fix a particular combination of specificity and sensitivity. Several approaches have been proposed to select the cut-off point to distinguish best the high-risk group from the average-risk population. The simplest approach is to accept that cut-off which minimizes the total proportion of misclassification, which is equivalent to maximizing the sum of sensitivity and specificity. However, these two components of validity have different implications, which cannot be directly compared. Sensitivity is mainly related to the objective of screening and specificity to the adverse effects. When the total number of misclassified cases is minimized, sensitivity and specificity are considered of equal importance. This may be problematic, because it implies that false negatives and false positives have a similar impact. The importance of sensitivity relative to specificity implies a weighted sum of misclassification as the basis to find a correct cut-off point. However, there are no objective weightings for sensitivity and specificity. Selection of a particular combination for validity components, that is, sensitivity and specificity, always involves value judgment.
This does not mean that any combination of specificity and sensitivity is acceptable. It depends on the test and the disease to be screened: for Pap smear some false positives are regarded as acceptable. This is because the yield is of primary importance and confirmation of the diagnosis is regarded, not necessarily correctly, as relatively reliable, noninvasive, and inexpensive. False-positive diagnoses pose a problem in breast cancer screening, because some degree of abnormality is common in the breast, but confirmation of a positive test is rather expensive. Low specificity is problematic because of the potentially high cost of diagnostic confirmation and exceeding the capacity of clinical diagnostic services with screen-positive cases.
Episode validity describes the ability of the screening episode to detect disease in the DPCP and to identify those who are healthy. Attempts to confirm the diagnosis after a positive screening test may fail to identify the disease and a (true-positive) case may thus be labeled as a false-positive screening test. For example, the PSA test has a high sensitivity (Stenman et al., 1994), but biopsy may fail to identify the malignant lesion. Therefore, many cases that are in the DPCP will not be diagnosed during the screening episode. The difference between test sensitivity and episode sensitivity is obvious for a screening test that is independent of the biopsy-based confirmation process. Screening for cervical cancer is based on exfoliated cells and the lesion where the malignant cells originated may not be detected at colposcopic biopsy. Even a biopsy for a breast cancer seen on screening mammography may fail to contain the malignant tissue.
Program validity is a public health indicator. Program sensitivity is related to the yield of the screening program, the detection of disease in DPCP in the target population. The program specificity indicates the correct identification of subjects free of the disease in the target population.
Program validity depends on the screening test, confirmation of the test, attendance, the screening interval, and the success of referral for diagnostic confirmation of screen-positive cases.
Predictive Values Of A Screening Test
Estimates of sensitivity are derived from the screening program itself. The most immediate indication of sensitivity is the yield, or cases detected at screen. The rate of detection is insufficient because it does not as such consider the total burden that stems from the cancers detected both at screen and clinically in between the screens if repeated. The relationship between these interval cancers and screendetected cancers is not recommended as a measure of sensitivity because the cases that would not surface clinically if the population were not screened (overdiagnosis) cause a bias. Cancers diagnosed in nonattenders (those invited but not attended) at screening and in an independent control population contribute in estimating and understanding the screening validity. Methods are available that allow unbiased and comparable estimation of test, episode, and program sensitivity (IARC, 2005; Hakama et al., 2007).
For the screenee, it is important to know the consequences of the result of a screening test and episode (including also the subsequent examinations). These can be described by the predictive values of a test and episode (Table 2). The positive predictive value (PPV) is the proportion of persons who do have unrecognized (preclinical) disease among those who have a positive test or episode. The negative predictive value (NPV) is the proportion of those who are free from the target condition among those with a negative episode. The predictive values depend on the validity of the test and the episode and on the prevalence of the disease in the DPCP (Table 3). High predictive values require valid screening and diagnostic tests. Particularly for a rare disease, PPV is usually low and the majority of positive screening tests will then occur among those who do not have the disease. In contrast, if the prevalence is low, the NPV is high, that is, a negative test gives a very high probability of absence of the disease. Many of those who attend a screening program are seeking reassurance that they do not have the disease. In practice, this is the most frequent benefit of screening for rare diseases, and it emphasizes the importance of high specificity.
Evaluating The Effect Of Screening
An effective program shows an impact in the process indicators, that is, intermediate end points in screening. Such proxy measures reflect the performance of the program, but are only indirect indicators of the ultimate goal, which is mortality reduction. To be effective, mammographic screening for breast cancer must provide sufficient coverage of the target population, identify preclinical breast cancer reliably, and lead to the detection of early cancers that are more curable than those diagnosed without screening. An evaluation of the effect of a screening program based on process indicators alone is inadequate: these are necessary but not sufficient requirements for effectiveness, because an ineffective program may still produce favorable changes in process indicators.
The screening test detects disease in the DPCP and the yield depends on the prevalence of unrecognized disease. Prevalence depends on the length of DPCP. For many cancers, the length of DPCP correlates with the prognosis: fast-growing cancers with a short DPCP have a poor prognosis. Screening detects a disproportionate number of slow-growing cancers compared with normal clinical practice, especially when pursuing a high sensitivity. Therefore, screen-detected disease tends to have a more favorable survival than clinically detected disease, because the cancers are selected to be more slow-growing than clinically detected ones. The bias introduced by this selection is called length bias (Feinleib and Zelen, 1969). Length bias cannot be directly estimated and adjustment for it is cumbersome. Therefore, study designs and measures of effect that are free from length bias should be used to assess the effectiveness of screening. They are based on the total target population. Randomized screening trials evaluating mortality outcome are free from bias caused by overdiagnosis, length bias, and lead time.
Lead time (Hutchison and Shapiro, 1968) is the amount of time by which the diagnosis of disease is brought forward compared with the absence of screening. By definition, an effective screening program gives some lead time, since earlier diagnosis is a requirement for achieving the goals of screening. The maximum lead time is equivalent to the length of the DPCP, or sojourn time. Therefore, even if screening does not postpone death, survival from the time of diagnosis is longer for a screendetected case than for a clinically detected case. Comparison of survival between screen-detected and symptom-detected patients is therefore biased unless it can be corrected for lead time. Such corrections remain crude at best, and survival is not a valid indicator of the effectiveness of screening.
Length bias and lead time are theoretical concepts of key importance in screening. Empirical assessment of length bias and lead time can provide valuable information on the natural history of the disease and the effects of screening (Day et al., 1984). These sources of bias, in addition to overdiagnosis, make indicators such as survival unsuitable for the evaluating the effectiveness of screening programs.
Evaluation of the effectiveness of screening should therefore be made in terms of the outcome, which for cancer is mortality. However, screening programs also affect morbidity and, more broadly, quality of life. Such effects should also be taken into account in screening decisions although they should not be mixed with process indicators of the program. For example, fertility may be maintained after treatment for a screen-detected precursor lesion of cervical cancer, and breast-conserving surgery with less cosmetic and functional impairment may be used for screen-detected breast cancer compared with clinically detected cancer. Such procedures may reduce physical invalidity and adverse mental effects, and thereby improve the quality of life. However, the process measures, for example, the number of conisations of the cervix or breast-conserving surgery or their proportion of all surgical procedures, are invalid indicators of effect, because of potential overdiagnosis. Overdiagnosis increases the proportion of cases with favorable features because healthy individuals are misclassified as cases of disease.
A randomized preventive trial with mortality as the end point is the optimal and often the only valid means of evaluating the effectiveness of a screening program. Cohort and case-control studies are often used as a substitute for trials. Most evidence on the effectiveness of screening programs stems from comparisons of time trends and geographical differences between populations subjected to screening of different intensity. These nonexperimental approaches remain crude and insensitive, however, and do not provide a solid basis for decision making.
Several biases may distort comparability between attenders and nonattenders at screening. The most obvious is the ‘healthy screenee’ effect: an individual must be well enough to attend if he or she is to participate. Furthermore, a patient with, for example, a previous diagnosis or who is already under close medical surveillance due to a related disorder has no reason to attend a program aimed at detection of that disease. Elimination of bias in a nonexperimental design is likely to remain incomplete. In randomized trials, previously diagnosed cancer is an exclusion criterion, that is, prevalent cases are excluded and randomization with analysis based on the intention-to-screen principle guarantees comparability.
The randomized screening effectiveness trials with mortality as end point provide a solid basis for a routine screening program that is run as a public health policy. Such mass screening programs should be evaluated and monitored. Intervention studies without a control group (also called demonstration projects or single-arm trials) and other nonexperimental designs (cohort and casecontrol studies) have been proposed for this type of evaluation, but each approach has inherent biases. A randomized approach with controls must be considered the gold standard, to be adopted if possible.
All routine screening programs are introduced gradually. Extension from the initial stage requires time and planning and involves more facilities. In several countries, use of Pap smears developed from very infrequent to become part of normal gynecological practice or public health service within about 10 years. It follows that a screening program can be introduced as a public health policy in an experimental design, with comparison of screened and unscreened groups formed by random allocation. Under such circumstances, provision of screening can be limited to a randomly allocated sample of the population, instead of a self-selected or haphazardly selected fraction of the population. As long as the resources for the program are only adequate for a proportion of the population, it is ethically acceptable to randomize, because screening is not withheld from anybody. There is, a priori, an equal chance for everybody in the target population to receive the potential benefit of the program and to avoid any adverse effects of the program. In this context, the equipoise (lack of firm evidence for or against an intervention), which is an ethical requirement for conducting a randomized trial, gradually disappears as evidence is accrued within the program. For those planning public health services, this will provide the most reliable basis for accepting or withholding new activities within the services.
Organizing A Screening Program
Screening is a chain of activities that starts from defining the target population and extends to the treatment and follow-up of screen-detected patients. A screening program consists of several elements that are linked together.
Different cancer screening programs consist of different components. In general, they can be outlined as eight distinct steps, divided into four components.
- Definition of target population.
- Identification of individuals.
- Measures to achieve sufficient coverage and attendance, such as personal letter of invitation.
Test execution component:
- Test facilities for collection and analysis of the screen material.
- Organized quality control program for both obtaining screen material and its analysis.
- Adequate facilities for diagnosis, treatment, and follow-up of patients with screen-detected disease.
- A referral system linking the screenee, laboratory (providing information about normal screening tests), and clinical facility (responsible for diagnostic examinations following an abnormal screening test and management of screen-detected abnormalities).
- Monitoring, quality control, and evaluation of the program: availability of incidence and mortality rates for the entire target population, and separately for both attenders and non-attenders.
Routine screening can be divided into opportunistic (spontaneous, unorganized) and organized screening (mass screening, screening program). The major differences are related to the level of organization and planning, systematic nature, and scope of activity. The components described previously are characteristics of an organized screening program. Most of them are not found in opportunistic screening.
Spontaneous screening frequently focuses on high sensitivity at the cost of low specificity. This is due to several factors including economic incentives (fee-for-service) and risk-averse behavioral models (avoiding neglect, fear of litigation). However, more emphasis on specificity would be consistent with the primary ethical responsibilities of the physician under the Hippocratic oath: First, do no harm. For a high-technology screening program, such as mammography for breast cancer, low specificity results in high cost and frequent adverse effects. Furthermore, in countries with limited resources for the follow-up of screen-positive cases, screening competes for scarce resources that could be used for the treatment of overt disease with worse outcomes.
Major organizational considerations of any screening program are the age range to be covered and screening interval. For example, in Western populations with similar risk of disease and available resources, cervical cancer screening policies range from annual testing from the start of sexual activity to a smear every 5 years from age 30 to age 55. Hence, the difference in the cumulative number of tests over a lifetime varies from 6 to more than 60, that is, 10-fold.
In general, only organized screening programs can be evaluated. An organized program with individual invitations can prevent excessively frequent screening and overuse of the service, and is therefore less expensive. If screening appears ineffective, an organized screening program is also easier to close than a spontaneous activity. Therefore, organized screening programs should be recommended over opportunistic screening.
Cancer Screening By Primary Site
The following sections on cervical, breast, and colorectal cancer screening describe cancer screening with proven effectiveness.
Cervical Cancer Screening
Cervical cancer is the second most common cancer among women worldwide, with 493 000 new cases in 2002 and 273 000 deaths. The great majority of the burden is in developing countries, with highest incidence rates in Africa and Latin America.
Cervical cancer is thought to develop gradually, through a progression of a series of precursor lesions from mild abnormality (atypia) into more aberrant lesions (dysplasia) and eventually malignant changes (initially in situ, then microinvasive and finally invasive carcinoma). Human papillomavirus (HPV) infection is a common early event. Most infections are cleared spontaneously within 6–12 months or less. Early precursor lesions seem to be a rare consequence of persistent infection with oncogenic HPV types. High-grade neoplasia occurs rarely without persistent HPV infection and viral load also predicts the probability of progression. Regression of lesions occurs commonly at early stages and the rate of progression is likely to increase during the process, with accumulation of abnormalities. The duration of the detectable, preclinical phase has been estimated at as long as 12–16 years.
Screening With Cervical Smears
Screening for cervical cancer is based on detecting and treating unrecognized disease, primarily premalignant lesions to prevent their progression into invasive carcinoma. Traditional techniques are based on cytological sampling of cells from the cervix. The classical smear is based on cytological assessment of exfoliated cervical cells from the transformation zone, where the squamous epithelium changes into columnar epithelium. The sample is collected from the vaginal part of the cervix using a spatula and from the endocervix with a brush or swab. Sampled cells are fixed on a glass slide for evaluation with microscope.
There are several classifications for cytological and histopathological abnormalities of the cervix. The present Bethesda 2001 system consists of two classes of benign atypical findings (atypia thought to occur as reactive change related to infection; atypical squamous cells of unknown significance [ASCUS], or a result that cannot exclude higher grade lesion, ASC-H) and two classes of more aberrant changes (low and high-grade squamous intraepithelial lesion, LSIL and HSIL, respectively). The diagnostic classification of preinvasive changes is based on cervical intraepithelial neoplasia (originally CIN1–3, corresponding to mild, moderate, and severe dysplasia [including carcinoma in situ], modified by combining mild, moderate, and in situ into high-grade CIN).
Diagnostic assessment requires colposcopic examination (microscopic visualization with 6to 40-fold magnification) for assessment of morphological features of the cervix. Histologic assessment is based on colposcopy-directed punch or cone biopsy.
The incidence of cervical cancer is very low in the first year following a negative smear and increases gradually, returning to baseline at about 10 years. The risk has still been 50% lower at 5 years, compared with unscreened women. The risk of cancer decreases further with the number of consecutive negative tests.
Effectiveness Of Screening
The objective of cervical cancer screening is to reduce both cervical cancer incidence and mortality. A successful screening program detects early, preinvasive lesions during the preclinical detectable phase and is able to reduce deaths by preventing the occurrence of invasive cancer.
No randomized trials were conducted to evaluate the mortality effects of cervical cancer screening at the time when it was introduced. There is, however, evidence for substantial incidence and mortality reduction from nonrandomized studies conducted in several countries when screening was introduced. In such studies, a screened population is identified at the individual level. The screened and unscreened women may however not be comparable in terms of disease risk, which can induce selection bias. Several such studies have been summarized by IARC (2005). In British Columbia, Canada, a screening program was introduced in 1949. During 1958–1966, the incidence of cervical cancer among 310 000 women screened at least once was well below the rates preceding the screening project (standardized incidence ratio (SIR) 0.16, 13 cases), while 230 000 unscreened women showed no decrease (SIR 1.08, 67 cases). In Finland, a population-based cervical cancer screening program was started in 1963 and evaluation of more than 400 000 women covered showed effectiveness of 60% at 10 years in terms of incidence reduction. A Norwegian study with 46 000 women invited for screening showed approximately 20% lower cervical cancer incidence and mortality among participants than among the reference population.
Several case-control studies have been carried out in Denmark, Latin America, Canada, South Africa, and other countries to evaluate the efficacy of cervical cancer screening. Most have shown odds ratios in the range of 0.3–0.4 for invasive cervical cancer associated with ever versus never having had a screening test, but the results are prone to selection bias (IARC, 2005).
In ecological analyses without individual data, introduction of screening has been associated with a reduction in cervical cancer incidence and mortality. Major differences in the timing of introduction and the extent of cervical cancer screening between Nordic countries with similar baseline risk has allowed assessment of the effect of screening on cervical cancer incidence and mortality. Adoption of screening as a public health policy has been followed by a sharp reduction in cervical cancer occurrence. Comparison between counties in Denmark with and without cervical cancer screening showed that both incidence of and mortality from the disease were lower by a third in the areas with organized mass screening. Conversely, in an area where organized screening had been discontinued, increased incidence of invasive cancer was found. An analysis of 15 European countries was also consistent with a 30–50% reduction in cervical cancer incidence related to organized cervical cancer screening. Similar reductions in cancer incidence and/or mortality have been reported after launching screening programs in England and Wales, the United States, and Australia.
The recommended interval between screens has ranged from 1 to 5 years. Most screening programs start from 18– 30 years of age and are discontinued after age 60–70 years. The most intensive screening protocols with an early start and frequent testing involve 10 times as many tests over a woman’s lifetime as the most conservative approaches. Organized programs with large, population-based target groups tend to be least intensive, but nevertheless able to produce better results than less organized efforts, with ambitious screening regimens but very incomplete coverage of the target population and a weaker link among the program components (testing, diagnosis, and treatment). In some programs, the frequency of screening has been modified according to the screening result, either starting at annual screening, for example, and increasing the interval after negative results, or conversely, offering initially a longer, 3–5-year interval that is shortened if there is any abnormality.
Adverse Effects Of Screening
Overdiagnosis of preinvasive lesions (i.e., detection and treatment of changes that would not have progressed into malignancy) appears common in cervical cancer screening, as only a small proportion of preinvasive lesions would develop into a cancer, even if left untreated. The cumulative risk of an abnormal screening test is relatively high compared with lifetime risk of cancer in the absence of screening (10–15% or higher versus approximately 3%). Treatment has several adverse effects. Excisional treatments are associated with pregnancy complications, including preterm delivery and low birth weight. Hysterectomy, quite obviously, leads to loss of fertility.
Other Screening Tests
Direct visualization of the cervix has been evaluated as a screening method that does not require sophisticated technology or highly trained personnel. Unaided visual inspection (also known as downstaging) can identify bleeding, erosion, and hypertrophy. Low-level magnification ( 2–4) has not been shown to improve performance of visual inspection. Use of acetic acid in visual inspection results in white staining of possible neoplastic changes and improves test sensitivity. Lugol’s solution (or Schiller’s iodine test) stains neoplastic epithelium yellow and it appears to have similar or better performance than acetic acid. Visual inspection with iodine solution or acetic acid offers an effective and affordable screening approach. With trained personnel and quality control a 25% reduction in cervical cancer mortality can be achieved (Sankaranarayanan et al., 2007).
Commercially available HPV tests are based on nucleic acid hybridization and are able to identify more than 10 different HPV types. No trials have been completed that have compared effectiveness of HPV testing with cytological smears. Yet, preliminary findings indicate that HPV screening is likely to be at least as effective as screening based on smears, but is also likely to have more adverse effects including lower specificity. Etiology-based screening may label cervical cancer as a sexually transmitted disease, despite common nonsexual transmission of the virus. This may reduce the acceptability of screening and reduce participation. One hazard is the stigmatization of the woman carrying the virus, regardless of the route of transmission (e.g., infidelity of the woman’s partner), especially in cultures where the norms guard female sexuality more strictly than that of men.
In summary, the effectiveness of cytological smears in cervical cancer screening has never been established with current, methodologically stringent criteria. However, there is extensive and consistent evidence showing that a well-organized screening program will reduce both the incidence of and mortality from invasive carcinoma.
In the future, HPV vaccination has the potential to influence profoundly the conditions in which screening operates, and possibly to reduce the demand for cervical cancer screening by lowering the risk of the disease. This may take at least one generation to achieve.
Breast Cancer Screening
Breast cancer is the most common cancer among women and it accounts for a fifth of all cancers among women worldwide. Approximately 1 150 000 new cases occurred in the world in 2002. The number of breast cancer deaths in 2002 was estimated at 410 000. Increasing incidence rates have been reported in most populations. The highest incidence rates have been reported in high-income countries, in North America, Europe, as well as in Australia and New Zealand. Yet, in terms of numbers of cases, the burden of breast cancer is comparable in rich and poor countries of the world.
In a synthesis of seven reports on breast cancer as an autopsy finding, the median prevalence of invasive breast cancer at autopsy was 1.3%. The mean sojourn time has been estimated at 2–8 years. Sojourn times tend to be longer for older ages and may depend on the histological type of breast cancer.
In screening, the primary target lesion is early invasive cancer, but ductal carcinoma in situ is also detected with a frequency about 10–20% that of invasive cancer.
Mammography is X-ray imaging of the breast with a single or two views read by one or two radiologists. The screen-positive finding is a lesion suspicious for breast cancer, appearing typically as an irregular, starlike lesion or clustered micro calcifications. Two views are likely to increase detection rates by approximately 20%, with most benefit for detection of small cancers among women with radiologically dense breasts. In some screening programs, two views are used only at first screening, with only one view (mediolateral oblique) subsequently. Similarly, double reading appears to increase both the recall rate and detection of breast cancer by some 10%. Currently, digital mammography is replacing film technology.
Screen-positive findings result in the recall of 2–15% of women for additional mammographic examinations, with biopsy in approximately half of these women. Detection rates vary between populations of 3–11 per 1000, corresponding to PPVs of 30–70% for biopsied women. Sensitivity has been estimated at 60–70% in most randomized trials and close to that also in service screening. Specificity (the proportion of true negatives among all negative screens) has been reported at 95–98% in most studies, with lower figures in the initial (prevalence) screen than in subsequent screens.
Effectiveness Of Screening
Twelve randomized trials have evaluated mortality reduction in mammography screening (Table 4). In the age range 50–69, the trials have shown relatively consistent mortality reduction in the range of 20–35%. Excluding the Canadian trial with women 50–59 years, which showed no benefit, estimates of the number of women needing to be screened (NNS) to prevent one death from cervical cancer has ranged from 1000 to 10 000 women at 10 years.
The Health Insurance Plan trial in New York was the first randomized screening trial. It had approximately 60 000 women randomized pair-wise. The subjects were aged 40–64 years at entry. Four screening rounds with 1-year intervals were performed using two-view mammography and clinical breast examination (CBE). CBE was also used in the intervention arm. Causes of death were evaluated by a committee unaware of the allocation. Breast cancer mortality was reduced in the screening arm by 20% (5.5 vs. 6.9 per 100 000). In the analysis with the longest follow-up, the mortality difference persisted after 18 years of follow-up.
Several cluster-randomized screening trials have been carried out in Sweden. In the Kopparberg trial with three screening rounds, an 18% reduction in breast cancer mortality was reported after 20 years of follow-up. The O¨ stergo¨ tland study showed 13% lower breast cancer mortality in the screening arm after four screening rounds and 17 years of follow-up. The Malmo¨ trial with 22 000 women aged 45–69 years reported an 18% reduction in breast cancer mortality for the screening group after 19 years. In the Stockholm trial, nearly 40 000 women aged 40–64 years were enrolled and breast cancer mortality was 12% lower in the screening group after 15 years of observation.
Two mammography screening trials have been conducted in Canada. Both were started simultaneously, one enrolling volunteers aged 40–49 (Canadian National Breast Screening Study (CNBBS1) 50 000 women) and the other 50–59 years (NBBS2 40 000 women). Intervention was annual two-view mammography and CBE for 5 years in both trials, with no intervention for the control arm in the younger group and CBE only in the older cohort. No reduction in breast cancer mortality was found in either age group in analyses covering 13 years of follow-up.
The Edinburgh trial used cluster randomization based on general practices, with more than 44 000 women aged 45–64 years at entry. The screening interval was 12 or 24 months and covered four screening rounds. After 13 years of follow-up, breast cancer mortality was 19% lower in the screening arm.
The randomized trials have been criticized for methodological weaknesses (Olsen and Gøtzsche, 2001), especially for incorrect randomization and postrandomization exclusions leading to lack of comparability between the trial arms. In a systematic review excluding studies with possible shortcomings, only two trials were finally evaluated and no benefit was demonstrated (Olsen and Gotzsche, 2001). It was also argued that breast cancer mortality is not a valid end point for screening trials. However, the validity of these criticisms has been rebutted by several investigators and international working groups. The dismissal of studies based on mechanistic evaluation of technical criteria of questionable relevance has been considered inappropriate.
National breast cancer screening programs are ongoing in several European countries. They are organized either regionally or nationally and use guidelines and quality assurance systems for both radiology and pathology. Common features are target groups at ages 50–69 and 2-year intervals. In several Northern European countries, participation around 80% has been achieved with recall rates of 1–8%. PPVs have been in the range of 5–10% and detection rate of invasive cancer generally has been at 4–10 per 1000 in the initial screen and 2–5 per 1000 in subsequent rounds (IARC, 2002). In the evaluation of effectiveness, nonrandomized approaches have been used (with the exception of Finland where screening was introduced using a randomized design). In such studies, definition of controls and estimation of expected risk of death is problematic. Extrapolations of time trends and comparisons between geographical areas have been used as in evaluation of effectiveness. Yet, comparability between screened and nonscreened groups in such studies is questionable. Also, exclusion criteria are not as strict as in randomized trials and exclusion of prevalent cancers at baseline has not always been possible.
In Finland, the female population was divided into a screening group and a control group based on birth year. The women were aged 50–59 at entry and 89 000 women born on even years were invited to mammography screening every second year, while 68 000 born on odd years were not invited but served as a reference group. Participation was 85%. Refined mortality was used, that is, breast cancers diagnosed before the start of the screening program were excluded. Overall, a 24% reduction in breast cancer mortality was reported at 5 years, which did not reach statistical significance.
In the Netherlands, a statistically significant reduction in breast cancer mortality was reported following introduction of mammography screening. The target age group was 50–69 years and participation reached 80%. Compared to rates before screening, the reduction in mortality was 19% at 11 years of follow-up (85.3 vs. 105.2 per 100 000). No similar decrease was found for women in older age groups, suggesting that the reduction was attributable to screening and not to improvement in treatment.
In an evaluation of mammography screening in Sweden, mortality from breast cancers diagnosed during the screening period was evaluated in seven counties between 1978 and 1990. A statistically significant 32% reduction in refined breast cancer mortality was observed following introduction of screening in counties with 10 years of screening and an 18% reduction in areas with shorter screening periods. A more recent assessment showed a 26% mortality reduction after 11 years among 109 000 women with early screening, compared with areas with later introduction of screening.
In England and Wales, breast cancer mortality decreased by 21% after introduction of mammography screening for women aged 50–69 years compared to that expected in the absence of screening (predicted from underlying trend). The estimated reduction in breast cancer mortality gained by screening was 6%, while the rest was attributed to improvements in treatment.
A Danish study showed a significant 25% reduction in nonrefined breast cancer mortality within 10 years after introduction of screening in Copenhagen for the age group 50–69 years compared with earlier rates and control areas.
No substantial effect on breast cancer mortality at population level was shown in an early analysis of a screening program in Florence, Italy. During the first 9 years of the program, breast cancer mortality declined by merely 3%, which was attributed to the relatively low coverage of the program (60%).
In a comparison of Swedish counties that introduced mass screening with mammography in 1986–87 with those starting after 1992, only a nonsignificant 10% mortality reduction from breast cancer was found at 10 years among women aged 50–69 at entry. Among women aged 70–74 years, the mortality reduction was even less (6%).
The results from nonrandomized evaluation are consistent with a mortality reduction obtained in screening trials and suggest that mortality reduction is achievable when mammography screening is applied as public health policy, though it may be somewhat less than the average effect of approximately 25% seen in randomized trials. Further, improvement in quality of life can be gained by early diagnosis, allowing a wider range of treatment options, with the possibility of avoiding radical surgery (and possibly adjuvant chemotherapy).
The extent of overdiagnosis and subsequent unnecessary treatment of lesions that would not have progressed may not be as large for breast cancer screening as for several other cancer types. The estimates of overdiagnosis have been in the range of 3–5%.
Preinvasive cancer (in situ carcinoma) is detected at screening with a frequency of approximately 1 per 1000 screens and is commonly treated surgically, even if all cases would not progress to cancer. Mammography causes a small radiation dose (1–2 mGy) to the breast, which can be expected to increase breast cancer risk. The excess risk is, however, likely to remain very small (in the range of 1–3% or less increase in the relative risk).
Other Screening Tests
Digital mammography has been adopted recently. It appears to yield a higher detection rate than conventional film mammography, but correspondingly, specificity seems lower. It remains unclear if the higher detection rate also increases overdiagnosis. No studies have evaluated the effect of digital mammography on breast cancer mortality.
Studies of magnetic resonance imaging among high-risk groups have suggested higher sensitivity compared with mammography, but no randomized trials have compared its effect on mortality with mammography. An advantage is avoidance of exposure to ionizing radiation. Yet, it is also more expensive and time-consuming.
CBE consists of inspection and palpation of the breasts by a health professional to identify lumps or other lesions suspicious for cancer. No randomized trials have evaluated the effectiveness of CBE alone, but it was included in the intervention arm of the Health Insurance Plan (HIP), Canadian, and Edinburgh trials. It may increase the sensitivity of screening if combined as an ancillary test in a mammography screening program.
Breast self-examination (BSE) has been evaluated as a resource-sparing option for early detection of breast cancer. A randomized trial in Shanghai showed no reduction in breast cancer mortality following instruction in BSE. A similar conclusion was reached also in a trial carried out in the former Soviet Union.
Several randomized trials have shown that mammography screening can reduce mortality from breast cancer and evidence from studies evaluating service screening indicates a similar or slightly smaller effect at population level. The age group with most benefit is 50–69 years. The methodological limitations of the studies are not severe enough to justify ignoring their results.
Colorectal Cancer Screening
Colorectal cancer ranks as the second most common cause of cancer death. There were more than 1 million new cases in 2002, with more than 500 000 deaths.
The majority of colorectal carcinomas are thought to arise from benign precursor lesions (adenoma). Adenoma (particularly those with a diameter of up to 1 cm, or dysplasia) and early carcinoma make up the principal target of screening. The duration of the detectable preclinical phase has been estimated at 2–6 years.
Several screening methods are available for colorectal cancer screening, including fecal occult blood testing (FOBT) and endoscopic examination (sigmoidoscopy or colonoscopy) as well as radiographic examination (double-contrast barium enema).
FOBT is based on detection of hemoglobin in stools using guaiac-impregnated patches, where an oxidative reaction (pseudoperoxidase activity) results in color change, which is detectable on inspection. The most commonly used test, Hemoccult II, is not specific to human blood. Other tests are also available that immunologically detect human hemoglobin, but they are also more expensive. Rehydration (adding water to the specimen) can be used to increase the detection rate, but this also leads to more false-positive results. For screening, two specimens are usually obtained on 3 consecutive days. Dietary restrictions (avoiding red meat, vitamin C, and nonsteroidal anti-inflammatory drugs) and combination of tests may increase specificity, but can also reduce acceptability.
The detection rate of carcinoma with FOBT has been 0.2–0.5% for biennial screening. Sensitivity is considerably lower for detection of polyps, which do not bleed as frequently as cancers. The PPV has been 10% for cancer and 25–50% for adenoma.
Effectiveness Of FOBT
Two-year screening intervals have been most widely used and most studies have targeted the age groups 45–75 years. Three randomized trials evaluating incidence and mortality have been reported (Table 5). They show a consistent 6–18% reduction in mortality with biennial screening. A systematic review showed a 16% (95% confidence interval [CI] 7–23%) reduction in mortality. The NNS to prevent one colorectal cancer death was estimated as less than 1200 at 10 years, given two-thirds participation.
In the Nottingham (UK) trial, more than 150 000 subjects aged 45–74 years were recruited between 1981 and 1991 and randomized by household. No reduction in colorectal cancer incidence was shown, but mortality was 19% lower in the screening arm after a median 12 years of follow-up.
In the Danish Funen study, approximately 62 000 subjects aged 45–75 were enrolled in 1985. Randomization was performed in blocks of 14 persons (with spouses always in the same arm). The mortality reduction was 11% after a mean follow-up of 14 years. No decrease in colorectal cancer incidence was observed.
In the Minnesota (United States) trial, 46 551 volunteers aged 50–80 years were recruited between 1975 and 1977. After a mean follow-up of 15 years, a mortality reduction of 33% was shown in the annual screening group and 21% in the biennial screening group, compared with the control arm. The incidence of colorectal cancer was also approximately 20% lower in the screened groups.
Provision of colorectal cancer screening for the population has been tested in several countries. In France, a 16% mortality reduction was reported in a population of 90 000 offered screening, compared with control districts in the same administrative area. The incidence of colorectal cancer was similar in both populations during 11 years of follow-up.
Finland was the first country to launch a population-based FOBT screening program, in 2004. It was introduced using individual randomization, but the effects on colorectal cancer incidence and mortality have not been evaluated yet.
Adverse Effects Of Screening
FOBT is safe, but a positive result requires further diagnostic examinations such as endoscopy, which cause inconvenience, rare complications (e.g., perforation), and costs. Some overdiagnosis is likely, because not all precursor lesions would advance to cancer. Yet, the morbidity related to the removal of polyps is very moderate.
Other Screening Tests
Flexible sigmoidoscopy covers approximately 60 cm of the distal colon, where roughly half of all colorectal cancers occur. Any adenomas detected can be removed during the procedure. Compliance with sigmoidoscopy as a screening investigation has only been 50% or less. The detection rate is higher than with FOBT, suggesting higher sensitivity. Three case-control studies and a cohort study have suggested lower mortality from colorectal carcinoma following sigmoidoscopy, as well as lower incidence. These studies do not provide evidence as strong as that from randomized intervention trials, because selection bias and other systematic errors may affect the results. Therefore, the mortality reduction achievable with sigmoidoscopy remains unclear. A population-based randomized trial is ongoing in Norway with 20 000 subjects aged 50–64, comparing one sigmoidoscopy with no intervention. It is expected to provide important new information.
Screening colonoscopy has the advantage of covering the entire colon, but the procedure is expensive, bears substantial discomfort and has the potential for complications such as perforation (reported in 1–2 patients per 10 000). No trials have evaluated the effectiveness of screening colonoscopy.
Recently, fecal DNA analysis has been introduced as a new option for colorectal cancer screening, but no studies assessing its effectiveness have been conducted.
In summary, FOBT has been shown to decrease mortality from colorectal cancer in several randomized trials. It appears to be an underutilized opportunity for cancer control. Other screening modalities are also available, but there is currently no solid evidence for their effectiveness.
Screening For Other Cancers
Some evidence for effectiveness is available for oral cancer and liver cancer screening, but it is not as well established as for cervical, breast, and colorectal cancers.
Oral cancer is among the most common cancers in some areas of the world, largely due to the habit of tobacco chewing. Globally, more than 270 000 cases are detected annually, primarily in developing countries. A recent cluster-randomized trial of visual inspection for oral cancer demonstrated a 20% reduction in mortality among more than 190 000 subjects (Sankaranarayanan et al., 2005).
Liver cancer is the sixth most common cancer in the world, with more than 600 000 new cases in 2002. In terms of cancer deaths it ranks third, with nearly 600 000 deaths annually. Serum alpha-fetoprotein (AFP) and ultrasound have been used as a screening test for hepatocellular cancer. Two randomized trials have been carried out in China, both among chronic carriers of hepatitis B virus, who are at high risk of liver cancer. The smaller study found a nonsignificant 20% reduction associated with 6-monthly AFP tests among 5500 men in Qidong county (Zhang et al., 2004). Another trial involved 18 000 people and showed a one-third mortality reduction at 5 years with 6-monthly AFP and ultrasonography (Chen et al., 2003).
The following section, ‘Lung Cancer Screening,’ describes cancer screening with evidence against effectiveness.
Lung Cancer Screening
Lung cancer is the most common cancer in many countries. Mortality rates are very similar to incidence due to its very poor prognosis. In 2002, the global number of cases was 1.35 million and there were 1.18 million deaths.
Natural History, Diagnosis, And Treatment
The target lesion for lung cancer screening is early, resectable (stage 1) carcinoma. Diagnostic examinations may include high-resolution computerized tomography (CT), positron-emission tomography (PET), transthoracic needle biopsy, and thoracotomy. A conclusive diagnosis of early lung cancer is based on biopsy.
Screening With Chest X-Rays, With Or Without Sputum Cytology
The plain chest X-ray was evaluated as a screening test in several randomized screening trials in the 1960s and 1970s. Commonly, it was combined with cytological assessment of exfoliative cells that are most commonly detected in squamous cell carcinoma.
Small nodules are easily missed in chest X-rays and sensitivity is low, with specificity above 90%. Detection rates have been 0.1–0.8% and positive predictive value 40–60%. False-negative results in sputum cytology are common among patients with lung cancer; its sensitivity is also regarded as inferior to that of chest X-rays.
Effectiveness Of Screening
Four randomized trials of lung cancer screening with chest X-rays have been conducted, but only one compared chest X-ray screening against no intervention (until the end of the 3-year study period, Table 6). One compared chest X-ray and sputum examination offered within the trial against a recommendation to have such tests. The other two trials assessed the impact of chest X-ray alone with chest X-ray and sputum cytology. The interval between rounds ranged between 4 months and 1 year. The detection rate of lung cancer at baseline examination has been 0.1–0.8%. In a meta-analysis, lung cancer mortality was increased by 11% with the more intensive screening.
Adverse Effects Of Screening
A false-positive test leads to invasive diagnostic procedures. Overdiagnosis has been estimated as 15% based on long-term follow-up in one of the trials.
Other Screening Methods
In spiral low-dose CT, the screen-positive finding is a noncalcified nodule, usually at least 1 cm in diameter. For smaller lesions, follow-up examinations may be needed to define whether the nodule is growing.
No randomized trials have been done, so the effect on mortality has not been established. Adverse effects due to false-positive results appear common but the extent of overdiagnosis has not been established. In the United States, the National Lung Screening Trial is comparing annual chest radiography with annual low-dose CT among 50 000 smokers.
To conclude, chest X-rays have been shown not to reduce mortality from lung cancer. Opportunities provided by novel radiological technology have been eagerly advocated for screening, but currently there is no evidence on the effectiveness of screening based on spiral CT.
Screening For Other Cancers
Neuroblastoma is a tumor of the sympathetic nervous system in children, with an overall annual incidence rate of approximately 1 per 100 000 under the age of 15 years. Peak incidence is during the first year of life, and occurrence decreases after that. The natural course is highly variable. Early-stage disease, occurring mainly in young children, has a very favorable prognosis, while diagnosis at an advanced stage (and usually above 1 year of age) is associated with poorer survival. There is a subgroup of tumors with the potential to disappear spontaneously or mature into a benign tumor (ganglioneuroma), even in the presence of metastasis. Hence there is obvious potential for overdiagnosis. Screening is possible using urine tests for metabolites of the neuronal transmitters, catecholamines (homovanillic acid and vanillylmandelic acid), which are secreted by most (60–80%) tumors. Studies in Germany, Canada, and Japan have compared screened and unscreened cohorts to evaluate the effects of screening. Screening has led to a two to sixfold increase in the recorded incidence of neuroblastoma, and an increase at young ages has not been counterbalanced by a reduction at older ages. No reduction in mortality or in the occurrence of advanced disease has been demonstrated. Screening for neuroblastoma is therefore not recommended, even if no randomized trials have been conducted. Screening was recently discontinued in Japan.
The follow section, ‘Prostate Cancer Screening,’ including other cancers, describes cancer screening without sufficient evidence for or against effectiveness.
Prostate Cancer Screening
The recorded incidence of prostate cancer has increased rapidly in the past 10–15 years in most industrialized countries and it is currently the most common cancer among men in several countries. Globally, 679 000 prostate cancers were diagnosed in 2002, with 221 000 deaths from it.
The target lesion in prostate cancer screening is early invasive prostate cancer. The natural course of prostate cancer is highly variable, ranging from indolent to highly aggressive. Premalignant lesions such as prostatic intraepithelial neoplasia (PIN) exist, but are not strongly predictive of prostate cancer and are not considered as an indication for treatment. Latent prostate cancer is a common autopsy finding. It has been detected in more than 10% of men dying before the age of 50 years, and much more frequently in older men. The common occurrence of indolent prostate cancer is a clear indication for potential overdiagnosis. The mean lead time in prostate cancer has been estimated as 6–12 years.
Prostate Cancer Screening Based On PSA
PSA is a serine protease secreted by the prostate and it is usually found in low concentrations in serum, with levels increased by prostate diseases such as benign prostatic hyperplasia, prostatitis, or prostate cancer.
The specificity of PSA has been estimated as approximately 90%. Detection rates have ranged from less than 2% to above 5%, depending on the cut-off and population. In serum bank studies, where no screening has been offered, but baseline PSA levels have been used to predict subsequent incidence of prostate cancer, sensitivity has been estimated as 67–86% at 4–6 years.
Effectiveness Of PSA Screening
Mortality analysis has been published from only one, relatively small, randomized trial. A study carried out in Quebec, Canada, randomized 46 500 men aged 45–80 years in 1988 with two-thirds allocated into the screening arm (Labrie et al., 2004). Compliance with screening was only 24%. Screening interval was 1 year, with PSA cut-off 3 ng/ml. By the end of 1999, the mean length of follow-up was 10 years among unscreened men and 7 years in the screened group. When analyzed by trial arm (intention to screen analysis), no reduction in prostate cancer mortality was seen.
Two large randomized trials are being carried out, one in Europe and the other in the United States. The European Randomized trial of Screening for Prostate Cancer (ERSPC) has eight centers in the Netherlands, Finland, Sweden, Italy, Belgium, Spain, Switzerland, and France. It has recruited a total of more than 200 000 men aged 50–74 years. The first mortality analysis is planned in 2010.
In the United States, the Prostate, Lung, Colorectal and Ovary screening trial (PLCO) recruited 76 705 men aged 55–74 years in the prostate screening component between 1993 and 2001. Both serum PSA and digital rectal examination are used as screening tests. No mortality results are available yet.
Four case-control studies have evaluated prostate cancer screening, but they have not given consistent results. At any rate, the evidence from such nonrandomized studies should be seen as tentative.
Several ecological studies and time series analyses have been published, correlating the frequency of PSA testing (or the incidence of prostate cancer as a surrogate for PSA testing) with prostate cancer mortality. The results have been inconsistent. Given the shortcomings inherent in these approaches, the findings do not have the potential to provide valid conclusions.
Adverse Effects Of Screening
Overdiagnosis is potentially a major problem in prostate cancer screening. It has been estimated that 30–45% of the cancers detected by screening would not have been diagnosed in the absence of screening during the man’s expected lifetime. Overdiagnosis leads to unnecessary treatment of prostate cancer, which has several major adverse effects, including high rates of erectile dysfunction and urinary incontinence with surgery, as well as urinary incontinence and irritation of the rectum or bladder (chronic radiation cystitis and proctitis) with radiotherapy.
Other Screening Tests
The effect of digital rectal examination (DRE) as a screening test on death from prostate cancer has not been evaluated in randomized studies. Five case-control studies have yielded inconsistent results, which is thought to be due to the low sensitivity of DRE for the detection of early disease.
In summary, no effect on mortality from prostate cancer with screening by serum PSA or other means has been established. Screening should be limited to randomized trials. Randomized trials are on-going and should provide important evidence.
Screening For Other Cancers
Ovarian cancer is among the 10 most common cancers among women. In 2002, 204 000 new cases were diagnosed globally and there were 124 000 deaths from ovarian cancer.
The natural history of ovarian carcinoma is not well understood. It is unclear how commonly cancers develop from benign or borderline lesions to malignant disease, relative to carcinoma arising de novo. Similarly, the duration of the detectable preclinical phase remains unknown. Screening tests include transvaginal or transabdominal ultrasound for imaging and serum CA-125 as a biochemical marker. There is no evidence on the effectiveness of ovarian cancer screening in terms of mortality reduction. Preliminary results obtained from nonrandomized studies are not encouraging: the sensitivity is low and false-positive findings are common.
Cutaneous melanoma incidence has been increasing rapidly in most industrialized countries and it now ranks among the 10 most common cancers in several European countries. There are approximately 160 000 new cases diagnosed annually in the world. Some of the increase may be due to more active case finding and changes in diagnostic criteria. Mortality has not shown a similar increase. Survival is favorable in the early stages. A substantial proportion of melanomas (approximately a fifth) arise from atypical moles (naevi). Visual inspection can be used to identify early melanomas (or premalignant lesions). Diagnostic assessment requires a skin biopsy. No randomized trials have been conducted to evaluate the effect of screening on melanoma mortality.
Gastric carcinoma is the third most common cancer in the world with 930 000 cases in 2002. With 700 000 deaths annually, it is the second most common cause of cancer death globally. Fluoroscopic imaging (photofluorography) and endoscopy have been used in screening for stomach cancer. As no randomized trials have been reported, there is not sufficient evidence to allow sound evaluation of effectiveness or to make recommendations concerning screening.
Cancer Screening Guidelines
Several international and national organizations have given recommendations for cancer screening as public health policy (Table 7). They have been based on a variety of approaches from expert opinion and consensus development conferences to more objective methods of evidence synthesis. There is some degree of consistency between the guidelines, but also differences. For some organizations, the rationale for evaluation has been strictly based on evidence for effectiveness, with the goal of assessing whether there is sufficient high-quality research to justify screening. Other organizations, notably the American Cancer Society, have adopted a more ideological approach, with a bias in favor of screening (a low threshold for advocating screening). Similarly, medical specialty societies tend to adopt screening recommendations relatively eagerly (not included in the table).
The role of the organization or the task of the working group also affects the outcome, so that groups with more responsibility for planning health-care services tend to apply more stringent evaluation criteria than those without such responsibility. Countries with publicly financed health-care systems tend to be more conservative than countries with systems based on fee-for-service health-care systems.
In summary, establishing the benefits of screening usually requires evidence of a significant reduction in mortality from large randomized trials. Such a knowledge base exists for cervical, breast, and colorectal cancer (Table 8). The evidence is more limited for oral and liver cancer. Screening tests exist for numerous primary sites of cancer, but either their effectiveness has not been adequately evaluated, or a lack of effectiveness has been demonstrated. Even after randomized trials have shown efficacy (typically in specialist centers with volunteer subjects), the introduction of mass screening requires that pilot studies should first be done to demonstrate feasibility. After an organized screening program has been implemented, continuous evaluation is required to ensure that the benefits are maintained. Ideally, this is achieved by introducing a mass screening program with a randomized design, that is, comparing subjects randomly allocated to early entry with those covered only later. This approach is particularly important when the effect demonstrated in trials of efficacy is small. It is also highly recommended when an established technique is being replaced with a new one.
- Alexander FE, Anderson TJ, Brown HK, et al. (1999) Fourteen years of follow-up from the Edinburgh randomised trial of breast-cancer screening. The Lancet 353: 1903–1908.
- Andersson I and Janzon L (1997) Reduced breast cancer mortality in women under age 50: updated results from the Malmo mammographic screening program. Journal of the National Cancer Institute Monograph 22: 63–67.
- Bjurstam N, Bjo¨ rneld L, Warwick J, et al. (2003) The Gothenburg breast screening trial. Cancer 97: 2387–2396.
- Chen JG, Parkin DM, Chen QG, et al. (2003) Screening for liver cancer: Results of a randomised controlled trial in Qidong, China. Journal of Medical Screening 10: 204–209.
- Cole P and Morrison AS (1978) Basic issues in cancer screening. In: Miller AB (ed.) Screening in Cancer vol. 40, pp. 7–39. Geneva, Switzerland: UICC UICC Technical Report Series.
- Day NE and Walter SD (1984) Simplified models for screening: Estimation procedures from mass screening programmes. Biometrics 40: 1–14.
- Day NE, Walter SD, and Collette B (1984) Statistical models of disease natural history: Their use in the evaluation of screening programmes. In: Prorok PC and Miller AB (eds.) Screening for Cancer I – General Principles on Evaluation of Screening for Cancer and Screening for Lung, Bladder and Oral Cancer, UICC Technical Report Series. vol. 78, pp. 55–70. Geneva, Switzerland: UICC.
- Feinleib M and Zelen M (1969) Some pitfalls in the evaluation of screening programs. Archives of Environmental Health 19: 412–415.
- Furihata R and Maruchi N (1969) Epidemiological studies on thyroid cancer in Nagano prefecture, Japan. In: Hedinger CE (ed.) Thyroid Cancer, UICC Monograph Series. vol. 12, p. 79. Berlin: Springer-Verlag.
- Hakama M, Auvinen A, Day NE, and Miller AB (2007) Sensitivity in cancer screening. Journal of Medical Screening 14: 174–177.
- Hugosson J, Aus G, Becker C, et al. (2000) Would prostate cancer detected by screening with prostate specific antigen develop into clinical cancer if left undiagnosed. British Journal of Urology 85: 1978–1984.
- Hutchison GB and Shapiro S (1968) Lead time gained by diagnostic screening for breast cancer. Journal of the National Cancer Institute 41: 665–681.
- IARC (2002) Breast Cancer Screening. IARC Handbooks of Cancer Prevention vol. 7. Lyon, France: IARC Press.
- IARC (2005) Cervix Cancer Screening. IARC Handbooks of Cancer Prevention vol. 10. Lyon, France: IARC Press.
- Kronborg O, Jorgensen OD, Fenger C, and Rasmussen M (2004) Randomized study of biennial screening with a faecal occult blood test: results after nine screening rounds. Scandinavian Journal of Gastroenterology 39: 846–851.
- Kubik A, Parkin DM, Khlat M, Erban J, Polak J, and Adamec M (1990) Lack of benefit from semi-annual screening for cancer of the lung: follow-up report of a randomized controlled trial on a population of high-risk males in Czechoslovakia. International Journal of Cancer 45: 26–33.
- Labrie F, Candas B, Cusan L, et al. (2004) Screening decreases prostate cancer mortality: 11-year follow-up of the 1988 Quebec prospective randomized controlled trial. Prostate 59: 311–318.
- Levin ML, Tockman MS, Frost JK, and Ball WC (1982) Lung cancer mortality in males screened by chest X-ray and cytologic sputum examination: a preliminary report. Recent Results in Cancer Research 82: 138–146.
- Mandel JS, Church TR, Ederer F, and Bond JH (1999) Colorectal cancer mortality: effectiveness of biennial screening for fecal occult blood. Journal of the National Cancer Institute 91: 434–437.
- Marcus PM, Bergstralh EJ, Fagerstrom RM, et al. (2000) Lung cancer mortality in the Mayo Lung Project: impact of extended follow-up. Journal of the National Cancer Institute 92: 1308–1316.
- Melamed MR, Flehinger BJ, Zaman MB, et al. (1984) Screening for early lung cancer. Results of the Memorial Sloan-Kettering study in New York. Chest 86: 44–53.
- Miller AB, To T, Baines CJ, and Wall C (2000) Canadian National Breast Screening Study-2: 13-year results of a randomized trial in women aged 50–59 years. Journal of the National Cancer Institute 92: 1490–1499.
- Miller AB, To T, Baines CJ, and Wall C (2002) The Canadian National Breast Screening Study-1: breast cancer mortality after 11 to 16 years of follow-up. A randomized screening trial of mammography in women age 40 to 49 years. Annals of Internal Medicine 137: 305–312.
- Moss S, Cuckle H, Evans A, Johns L, Waller M, and Bobrow L (2006) Effect of mammographic screening from age 40 years on breast cancer mortality at 10 years’ follow-up: a randomised controlled trial. Lancet 368: 2053–2060.
- Nystro¨ m L, Andersson I, Bjurstam N, Frisell J, Nordenskjo¨ ld B, and Rutqvist LE (2002) Long-term effects of mammography screening: updated overview of the Swedish randomised trials. Lancet 359: 909–919.
- Olsen O and Gøtzche PC (2001) Cochrane review on screening for breast cancer with mammography. Lancet 358: 1340–1342.
- Sankaranarayanan R, Esmy PO, Rajkumar R, et al. (2007) Effectof visual screening on cervical cancer incidence and mortality in Tamil Nadu, India: a cluster-randomised trial. Lancet 370: 398–406.
- Sankaranarayanan R, Ramadas K, Thamas G, et al. (2005) Effect of screening on oral cancer mortality in Kerala, India: a clusterrandomised controlled trial. Lancet 365: 1927–1933.
- Scholefield JH, Moss S, Sufi F, Mangham CM, and Hardcastle JD (2002) Effect of faecal occult blood screening on mortality from colorectal cancer: results from a randomised controlled trial. Gut 50: 840–844.
- Shapiro S, Venet W, Strax P, and Venet L (1988) Current results of the breast cancer screening randomized trial: The Health Insurance Plan (HIP) of Greater New York Study. In: Day NE and Miller AB (eds.) Screening for Breast Cancer, pp. 3–15. Geneva, Switzerland: International Union Against Cancer.
- Stenman UH, Hakama M, Knekt P, Aromaa A, Teppo L, and Leinonen J (1994) Serum concentrations of prostate specific antigen and its complex with a1-ACT 0–12 years before diagnosis of prostate cancer. Lancet 344: 1594–1598.
- Taba´ r L, Vitak B, Chen HH, et al. (2000) The Swedish Two-County Trial twenty years later. Updated mortality results and new insights from long-term follow-up. Radiologic Clinics of North America 38: 625–651.
- Zhang BH, Yang BH, and Tang ZY (2004) Randomized controlled trial of screening for hepatocellular carcinoma. Journal of Cancer Research and Clinical Oncology 131: 417–422.