Sample The Field of Sample Surveys Research Paper. Browse other research paper examples and check the list of research paper topics for more inspiration. If you need a religion research paper written according to all the academic standards, you can always turn to our experienced writers for help. This is how your paper can get an A! Feel free to contact our research paper writing service for professional assistance. We offer high-quality assignments for reasonable rates.
1. Deﬁnition Of Survey Sampling
Survey sampling can be deﬁned as the art of selecting a sample of units from a population of units, creating measurement tools for measuring the units with respect to the survey variables and drawing precise conclusions about the characteristics of the population or of the process that generated the values of the units. A more speciﬁc deﬁnition of a survey is the following (Dalenius 1985):
Need a Custom-Written Essay or a Research Paper?
Academic Writing, Editing, Proofreading, And Problem Solving Services
(a) A Survey concerns a set of objects comprising a population. One class of population concerns a ﬁnite set of objects such as individuals, businesses, and farms. Another, concerns events during a speciﬁc time period, such as crime rates and sales. A third class concerns plain processes, such as land use or the occurrence of certain minerals in an area. More speciﬁcally one might want to deﬁne a population as, for example, all noninstitutionalized individuals 15–74 years of age living in Sweden on May 1, 2000.
(b) This population has one or more measurable properties. Examples of such properties are individuals’ occupations, business’ revenues, and the number of elks in an area.
(c) A desire to describe the population by one or more parameters deﬁned in terms of these properties. This calls for observing (a sample of ) the population. Examples of parameters are the proportion of unemployed individuals in the population, the total revenue of businesses in a certain industry sector during a given time period and the average number of elks per square mile.
(d) In order to get observational access to the population, a frame is needed i.e., an operational representation, such as a list of the population objects or a map of the population. Examples of frames are business and population registers; maps where the land has been divided into areas with strictly deﬁned boundaries; or all n-digit numbers, which can be used to link telephone numbers to individuals. Sometimes the frame has to be developed for the occasion because there are no registers available and the elements have to be listed. For general populations this is done by combining a multi-stage sampling and the listing procedure by letting the survey ﬁeld staff list all elements in sampled areas only. Other alternatives would be too costly. For special populations, for example, the population of professional baseball players in the USA, one would have to combine all club rosters into one frame. In some surveys there might exist a number of frames covering the population to varying extents. For this situation a multiple frame theory has been developed (see Hartley 1974).
(e) A sample of sampling units is selected from the frame in accordance with a sampling design, which speciﬁes a probability mechanism and a sample size. There are numerous sample designs developed for different survey situations. The situation may be such that the design chosen solves a problem (using multistage sampling when not all population elements can be listed, or when interviewer and travel costs prevent the use of simple random sampling of elements) or takes advantage of the circumstances (using systematic sampling, if the population is approximately ordered, or using stratiﬁed sampling if the population is skewed). Every sample design speciﬁes selection probabilities and a sample size. It is imperative that selection probabilities are known, or else the design is nonmeasurable.
(f ) Observations are made on the sample in accordance with a measurement design i.e., a measurement method and a prescription as to its use. This phase is called data collection. There are at least ﬁve different main modes of data collection: face-to-face interviewing, telephone interviewing, self-administered questionnaires and diaries, administrative records, and direct observation. Each of these modes can be conducted using different levels of technology. Early attempts using the computer took place in the 1970s, in telephone interviewing. The questionnaire was stored in a computer and a computer program guided the interviewer throughout the interview by automatically presenting questions on the screen and taking care of some interviewer tasks such as keeping track of skip patterns and personalizing the interview. This technology is called CATI (Computer Assisted Telephone Interviewing). Current levels of technology for the other modes include the use of portable computers for face to face interviewing, touch-tone data entry using the telephone key pad, automatic speech recognition, satellite images of land use and crop yields, ‘people meters’ for TV viewing behaviors, barcode scanning in diary surveys of purchases, electronic exchange of administrative records, and Internet. Summaries of these developments are provided in Lyberg and Kasprzyk (1991), DeLeeuw and Collins (1997), Couper et al. (1998) and Dillman (2000). Associated with each mode is the survey measurement instrument or questionnaire. The questionnaire is the result of a conceptualization of research objectives i.e., a set of properly worded and properly ordered questions. The design of the questionnaire is a science of its own. See for example Tanur (1992) and Sudman et al. (1996).
(g) Based on the measurements an estimation design is applied to compute estimates of the parameters when making inference from the sample to the population. Associated with each sampling design are one or more estimators that are functions of the data that have been collected to make statements about the population parameters. Sometimes estimators rely solely on sample data, but on other occasions auxiliary information is part of the function. All estimators include sample weights that are used to inﬂate the sample data. To calculate the error of an estimate, variance estimators are formed, which makes it possible to calculate standard errors and eventually conﬁdence intervals. See Cochran (1977) and Sarndal et al. (1992) for comprehensive reviews of the sampling theory.
2. The Status Of Survey Research
There are many types of surveys and survey populations that ﬁt this deﬁnition. A large number of surveys are one-time surveys aiming at measuring attitudes or other population behaviors. Some surveys are continuing, thereby allowing the estimation of change over time. An example of this is a monthly labor force survey. Typically such a survey uses a rotating design where a sampled person is interviewed a number of times. One example of this is that the person participates 4 months in a row, is rotated out of the sample for the next 4 months and then rotates back for a ﬁnal 4 months. Other surveys aim at comparing different populations regarding a certain characteristic, such as the literacy level in different countries. Business surveys often study populations where there are a small number of large businesses and many smaller ones. In the case where the survey goal is to estimate a total, it might be worthwhile to deliberately cut off the smallest businesses from the frame or select all large businesses with a probability of one and the smaller ones with other probabilities.
Surveys are conducted by many different organizations. There are national statistical offices producing official statistics, there are university-based organizations conducting surveys as part of the education and there are private organizations conducting surveys on anything ranging from official statistics to marketing. The survey industry employs more than 130,000 people only in the USA, and the world ﬁgure is of course much larger. Survey results are very important to society. Governments get continuing information on parameters like unemployment, national accounts, education, environment, consumer price indexes, etc. Other sponsors get information on e.g., political party preferences, consumer satisfaction, child day-care needs, time use, and consumer product preferences.
As pointed out by Groves (1989), the ﬁeld of survey sampling has evolved through somewhat independent and uncoordinated contributions from many disciplines including statistics, sociology, psychology, communication, education and marketing research. Representatives of these disciplines have varying backgrounds and as a consequence tend to emphasize different design aspects. However, during the last couple of decades, survey research groups have come to collaborate more as manifested by, for instance, the edited volumes such as Groves et al. (1988), Biemer et al. (1991), Lyberg et al. (1997), and Couper et al. (1998). This teamwork development will most likely continue. Many of the error structures resulting from speciﬁc sources must be dealt with by multi-disciplinary teams since the errors stem from problems concerning sampling, recall, survey participation, interviewer practices, question comprehension, and conceptualization.
The justiﬁcation for sampling (rather than surveying the entire population, a total enumeration) is lower costs but also greater efficiency. Sampling is faster and less expensive compared to total enumeration. Perhaps more surprisingly, sampling often allows a more precise measurement of each sampled unit than that possible in a total enumeration. This often leads to sample surveys having quality features that are superior to those of total enumerations.
Sampling as an intuitive tool has probably been used for centuries, but the development of a theory of survey sampling did not start until the late 1800s. Main contributors to this early development, frequently referred to as ‘the representative method,’ were Kiaer (1897), Bowley (1913, 1926), and Tschuprow (1923). Apart from various inferential aspects they discussed issues such as stratiﬁed sampling, optimum allocation to strata, multistage sampling, and frame construction. In the 1930s and the 1940s most of the basic methods that are used today were developed. Fisher’s randomization principle was applied to sample surveys and Neyman (1934, 1938) introduced the theory of conﬁdence intervals, cluster sampling, ratio estimation, and two-phase sampling. The US Bureau of the Census was perhaps the ﬁrst national statistical office to embrace and further develop the theoretical ideas suggested. For example, Morris Hansen and William Hurwitz (1943, 1949) and Hansen et al. (1953) helped place the US Labor Force Survey on a full probability-sampling basis and they also led innovative work on variance estimation and the development of a survey model decomposing the total survey mean squared error into various sampling and bias components. Other important contributions during that era include systematic sampling (Madow and Madow 1944), regression estimation (Cochran 1942), interpenetrating samples (Mahalanobis 1946) and master samples (Dalenius 1957). More recent efforts have concentrated on allocating resources to the control of various sources of error i.e., methods for total survey design, taking not only sampling but also nonsampling errors into account. A more comprehensive review of historical aspects are provided in Sample Surveys, History of.
3. The Use Of Models
While early developments focused on methods for sample selection in different situations and proper estimation methods, later developments have to a large extent focused on theoretical foundations and the use of probability models for increasing the efficiency of the estimators. There has been a development from implicit modeling to explicit modeling.
The model traditionally used in the early theory is based on the view that what is observed for a unit in the population is basically a ﬁxed value. This approach may be called the ‘ﬁxed population approach.’ The stochastic nature of the estimators is a consequence of the deliberately introduced randomization among the population units. A speciﬁc feature of survey sampling is the existence of auxiliary information i.e., known values of a concomitant variable, which is in some sense related to the variable under study, so that it can be used to improve the precision of the estimators. The relationship between the variable under study and the auxiliary variables are often expressed as linear regression models, which often can be interpreted as expressing the belief (common or the sampler’s own) concerning the structure of the relationship between the variables. Such modeling is used extensively in early textbooks, see Cochran 1953. A somewhat different approach is to view the values of the variables as realizations of random variables using probability models. In combination with the randomization of the units this constitutes what is called the superpopulation approach. Model based inference is used to draw conclusions based solely on properties of the probability models ignoring the randomization of the units.
Design based inference on the other hand ignores the mechanism that generated the data and concentrates on the randomization of the units. In general, model-based inference for estimating population parameters like means of subgroups can be very precise if the model is true but may introduce biased estimates if the model is false, while design-based inference leads to unbiased, but possibly inferior estimates of the population parameters. Model assisted inference is a compromise that aims at utilizing models in such a way that, if the model is true the precision is high, but if the model is false the precision will be no worse than if no model had been used.
As an example, we want to study a population of families in a country. We want to analyse the structure of disposable income for the households and ﬁnd out the relation between factors like age, sex, education, the number of household members, and the disposable income for a family. A possible model of the data generating process could be that the disposable income is a linear function of these background variables. There is also an element of unexplained variation between families having the same values of the background variables. Also, the income will ﬂuctuate from year to year depending on external variation in society. All this shows that the data generating process could be represented by a probability model where the disposable income is a linear function of background variables and random errors over time and between families. The super population model would be the set of models describing how the disposable income is generated for the families. For inferential purposes, a sample of families is selected. Different types of inference can be considered. For instance we might be interested in giving a picture of the actual distribution of the disposable income in the population at the speciﬁc time when we selected the sample, or we might be interested in estimating the coefficients of the relational model either because we are genuinely interested in the model itself e.g., for prediction of a future total disposable income for the population, which would be of interest for sociologists, economists and decision makers, or for using the model as a tool for creating more efficient estimators of the ﬁxed distribution, given for example that the distribution of sex and age is known with reasonable accuracy in the population and can be used as auxiliary information. Evidently, the results would depend on the constellation of families comprising our sample. If we use a sample design that over-represents the proportion of large households or young households with small children, compared to the population, the inference based on the sample can be misleading. Model-based inference ignores the sample selection procedure and assumes that the inference conditional on the sample is a good representation of what would have been the case if all families had been surveyed. Design-based inference ignores the data generation process and concentrates on the artiﬁcial randomisation induced by the sampling procedure. Model-assisted inference uses models as tools for creating more precise estimates. Broadly speaking, model-based inference is mostly used in the case when the relational model is of primary interest. This is the traditional way of analysing sample data as given in textbooks in statistical theory. Design-based inference on the other hand is the traditional way of treating sample data in survey sampling. It is mainly focused on giving a picture of the present state of the population. Model-assisted inference uses models as tools for selecting estimators, but relies on design properties. It too is mainly focused on picturing the present state in the population.
Modern textbooks such as Cassel et al. (1977) and Sarndal et al. (1992) discuss the foundations of survey sampling and make extensive use of auxiliary information in the survey design. The different approaches mentioned above have their advocates, but most of the surveys conducted around the world still rely heavily on design-based approaches with implicit modeling. But models are needed to take nonsampling errors into account since we do not know exactly how such errors are generated. To make measurement errors part of the inference procedure, one has to make assumptions about the error structures. Such error structures concern cognitive issues, question wording and perception, interviewer effects, recall errors, untruthful answers, coding, editing and so on and so forth. Similarly, to make errors of nonobservation (frame coverage and nonresponse errors) part of the inference procedure, one needs to model the mechanisms that generate these errors. The compromise called model-assisted inference takes advantage of both design and model-based features.
Analysis of data from complex surveys denotes the situation that occurs when the survey statistician is trying to estimate the parameters of a model used for description of a random phenomenon, for example econometric or sociological models such as time series models, regression models or structural equation models. It is assumed that the data available are sample survey data that have been generated by some sampling mechanism that does not support the assumption of independent identically distributed (IID) observations on a random variable. The traditional inference developed for the estimation of parameters of the model (and not for estimating the population parameters) presupposes that IID is at hand. In some cases, traditional inference based on e.g., maximum likelihood gives misleading results. Comprehensive reviews of analysis of data from complex surveys are provided by Skinner et al. (1989) and Lehtonen and Pahkinen (1995).
The present state of affairs is that there is a relatively well-developed sampling theory. The theory of non-sampling errors is still in its infancy, however. A typical scenario is that survey methodologists try to reduce potential errors by using for example, cognitively tested questionnaires and various means to stimulate survey participation, and these things are done to the extent available resources permit. However, not all nonsampling error sources are known and some that are known defy expression. The error reduction strategy can be complemented by sophisticated modeling of error structures. Unfortunately, a rather common implicit modeling seems to be that nonsampling errors have no serious effect on estimates. In some applications, attempts are made to estimate the total error or error components by evaluation techniques, i.e., for a subsample of the units, the survey is replicated using expensive ‘gold standard’ methods and the differences between the preferred measurements and the regular ones are used as estimates of the total errors. This is an expensive and time-consuming procedure that is not very suit- able for long-range improvements. A more modern and realistic approach is to develop reliable and predictable (stable) survey processes that can be continuously improved (Morganstein and Marker 1997).
Obviously there are a number of future challenges in the ﬁeld of survey sampling. We will provide just a few examples: (a) Many surveys are conducted in a primitive way because of limited funding and knowhow. The development of more efficient designs taking nonsampling errors into account at the estimation stage is needed. There is also a need for strategies that can help allocate resources to various design stages so that total errors are minimized. Sometimes those in charge of surveys concentrate their efforts on the most visible error sources or where there is a tool available. For instance, most survey sponsors know that nonresponse might be harmful. The indicator of nonresponse error, the nonresponse rate, is both simple and visible. Therefore it might be tempting to put most resources into this error source. On the other hand, not many users are aware of the cognitive phenomena that affect the response delivery mechanism. Perhaps from a total error point of view more resources should be spent on questionnaire design. (b) Modern technology permits simultaneous use of multiple data collection modes within a survey. Multiple modes are used to accommodate respondents, to increase response rates and to allow inexpensive data collection when possible. There are, however, mode effects and there is a need for calibration techniques that can adjust the measurements or the collection instruments so that the mode effect vanishes. (c) International surveys are becoming increasingly important. Most methodological problems mentioned are inﬂated under such circumstances. Especially interesting is the concept of cultural bias. Cultural bias means that concepts and procedures are not uniformly understood, interpreted, and applied across geographical regions or ethnical subpopulations. To deﬁne and measure the impact of such bias is an important challenge.
- Biemer P, Groves R, Lyberg L, Mathiowetz N, Sudman S 1991 Measurement Errors in Surveys. Wiley, New York
- Bowley A L 1913 Working-class households in reading. Journal of the Royal Statistical Society 76: 672–701
- Bowley A L 1926 Measurement of the precision attained in sampling. Proceedings of the International Statistical Institute XII: 6–62
- Cassel C-M, Sarndal C-E Wretman J 1977 Foundations of Inference in Survey Sampling. Wiley, New York
- Cochran W G 1942 Sampling theory when the sampling-units are of unequal sizes. Journal of the American Statistical Association 37: 199–212
- Cochran W G 1953 Sampling Techniques, 1st edn. Wiley, New York
- Cochran W G 1977 Sampling Techniques, 3rd edn. Wiley, New York
- Couper M, Baker R, Bethlehem J, Clark C, Martin J, Nicholls
- W, O’Reilly J 1998 Computer Assisted Survey Information Collection. Wiley, New York
- Dalenius T 1957 Sampling in Sweden. Almqvist and Wiksell, Stockholm, Sweden
- Dalenius T 1985 Elements of Survey Sampling. Notes prepared for the Swedish Agency for Research Cooperation with Developing Countries (SAREC)
- DeLeeuw E, Collins M 1997 Data collection methods and survey quality: An overview. In: Lyberg L, Biemer P, Collins M, DeLeeuw E, Dippo C, Scwarz N, Trewin D (eds.) Survey Measurement and Process Quality. Wiley, New York
- Dillman D 2000 Mail and Internet Surveys: The Tailored Design Method, 2nd edn, Wiley, New York
- Groves R 1989 Survey Errors and Survey Costs. Wiley, New York
- Groves R, Biemer P, Lyberg L, Massey J, Waksberg J (eds.) 1988 Telephone Survey Methodology. Wiley, New York
- Hansen M H, Hurwitz W N 1943 On the theory of sampling from ﬁnite populations. Annals of Mathematical Statistics 14: 333–62
- Hansen M H, Hurwitz W N 1949 On the determination of optimum probabilities in sampling. Annals of Mathematical Statistics 20: 426–32
- Hansen M H, Hurwitz W N, Madow W G 1953 Sample Survey Methods and Theory (I and II). Wiley, New York
- Hartley H O 1974 Multiple methodology and selected applications. Sankhya, Series C 36: 99–118
- Kiaer A N 1897 The representative method for statistical surveys (original in Norwegian). Kristiania Videnskabsselskabets Skrifter. Historisk-ﬁlosoﬁske klasse 4: 37–56
- Lehtonen R, Pahkinen E J 1995 Practical Methods for Design and Analysis of Complex Surveys. Wiley, New York
- Lyberg L, Biemer P, Collins M, DeLeeuw E, Dippo, C, Schwarz N, Trewin D (eds.) 1997 Survey Measurement and Process Quality. Wiley, New York
- Lyberg L, Kasprzyk D 1991 Data collection methods and measurement error: An overview. In: Biemer P, Groves R, Lyberg L, Mathiowetz N, Sudman S (eds.) Measurement Errors in Surveys. Wiley, New York
- Madow W G, Madow L H 1944 On the theory of systematic sampling, I. Annals of Mathematical Statistics 15: 1–24
- Mahalanobis P C 1946 On large-scale sample surveys. Philosophical Transactions of the Royal Society London, Series B 231: 329–451
- Morganstein D, Marker D 1997 Continuous Quality Improvement in Statistical Agencies. In Lyberg L, Biemer P, Collins M, DeLeeuw E, Dippo C, Scwarz N, Trewin D (eds.) Survey Measurement and Process Quality. Wiley, New York
- Neyman J 1934 On the two different aspects of the representative method: The method of stratiﬁed sampling and the method of purposive selection. Journal of the Royal Statistical Society 97: 558–625
- Neyman J 1938 Contribution to the theory of sampling human populations. Journal of the American Statistical Association 33: 101–16
- Sarndal C-E, Swensson B, Wretman J 1992 Model Assisted Survey Sampling. Springer-Verlag, New York
- Skinner C, Holt D, Smith T M F (eds.) 1989 Analysis of Complex Surveys. Wiley, New York
- Sudman S, Bradburn N, Schwarz N 1996 Thinking About Answers: The Application of Cognitive Processes to Survey Methodology. Jossey-Bass, San Francisco, CA
- Tanur J (ed.) 1992 Questions About Questions. Russell Sage, New York
- Tschuprow A A 1923 On the mathematical expectation of the moments of frequency distributions in the case of correlated observation. Metron 2: 461–93; 646–80