Survey Design Issues Research Paper

A treatment of survey questions intended to be useful to those wishing to carry out or interpret actual surveys should consider several issues: the basic difference between questions asked in surveys and questions asked in ordinary social interaction; the problems of interpreting tabulations based on single questions; the different types of survey questions that can be asked; the possibility of bias in questioning; and the insights that can be gained by combining standard survey questions with randomized experiments that vary the form, wording, and context of the questions themselves. Each of these issues is treated in this research paper.

1. The Unique Nature Of Survey Questioning

A fundamental paradox of survey research is that we start from the purpose of ordinary questioning as employed in daily life, yet our results are less satisfactory for that purpose than for almost any other. In daily life a question is usually asked because one person wishes information from another. You might ask an acquaintance how many rooms there are in her house, or whether she favors the legalization of abortion in America. The assumption on both sides of the interaction is that you are interested in her answers in and of themselves. We can call such inquiries ordinary questions.

In surveys we use similar inquiries—that is, their form, their wording, and the manner of their asking are seldom sharply distinguishable from ordinary questions. At times we may devise special formats, with names like Likert-type or forced-choice, but survey questions cannot depart too much from ordinary questioning because the essential nature of the survey is communication with people who expect to hear and respond to ordinary questions. Not surprisingly, respondents believe that the interviewer or questionnaire is directly interested in the facts and opinions they give, just as would an acquaintance who asked the same questions. They may not assume a personal interest in their answers, but what they do assume is that their answers will be combined with the answers of all others to give totals that are directly interpretable.

Thus, if attitudes or opinions are inquired into, the survey is viewed as a kind of referendum and the investigator is thought to be interested in how many favor and how many oppose legalized abortion or whatever else is at issue. If facts are being asked about, the respondent expects a report telling how many people have what size homes, or whatever the inquiry is about. (By factual data we mean responses that correspond to a physical reality and could, in principle, be provided by an observer as well as by a respondent, for example, when counting rooms. By attitudinal data we mean responses that concern subjective phenomena and therefore depend on self-reports by respondents. The distinction is not airtight: for example, the designation of a respondent’s ‘race’ can be based on self-report but also on the observations of others, and the two may differ without either being clearly ‘wrong.’) In this research paper the focus will be on attitudes, including opinions, beliefs, and values, though much of the discussion can be applied to factual data as well.

Experienced survey researchers know that a simple tally of responses to a question—what survey researchers refer to as the ‘marginals’—are usually too much a function of the way the question was asked to allow for any simple interpretation. The results of questions on legalized abortion depend heavily on the conditions, definitions, and other subtleties presupposed by the question wording, and the same is true to an extent even for a question on how many rooms there are in a house. Either we must keep a question quite general in phrasing and leave the definitions, qualifications, and conditions up to each respondent—which invites unseen variations in interpretation—or we must try to make the question much more limited in focus than was usually our goal in the first place.

Faced with these difficulties in interpreting univariate results from separate questions, survey investigators can proceed in one or both of two directions. One approach is to ask a wide range of questions on an issue and hope that the results can be synthesized into a general conclusion, even though this necessarily involves a fair amount of judgment on the part of the researcher. The other direction—the one that leads to standard survey analysis—is to hold constant the question (or the index, if more than a single item is being considered) and make comparisons across time or other variables. We may not be sure of exactly what 65 percent means in terms of general support for legalized abortion, but we act on the assumption that if the question wording and survey conditions have been kept constant, we can say, within the limits of sampling error, that it represents such and such an increase or decrease from an earlier survey that asked the same question to a sample from the same population. Or if 65 percent is the figure for men and 50 percent is the figure for women, a sex difference of approximately 15 percent exists.

Moreover, research indicates that in most cases relationships are less affected by variations in the form of question than are univariate distributions—generalized as the rule of ‘form-resistant correlations’ (Schuman and Presser 1981). The analytic approach, together with use of multiple questions (possibly further combined on the basis of a factor analytic approach), can provide a great deal of understanding and insight into an attitude, though it militates against a single summary statement of the kind that the respondents expect to hear. (Deming’s (1968, p. 601) distinction between enumerative and analytic studies is similar, but he treats the results from enumerative studies as unproblematic, using a simple factual example of counting number of children. In this research paper, univariate results based on attitude questions are regarded as questionable attempts to simulate actual referenda. Thus the change in terminology is important.)

This difference between what respondents expect— the referendum point of view—and what the sophisticated survey researcher expects—the analytic point of view—is often very great. The respondent in a national survey believes that the investigator will add up all the results, item by item, and tell the nation what Americans think. But the survey investigator knows that such a presentation is usually problematic at best and can be dangerously misleading at worst. Moreover, to make matters even more awkward, political leaders often have the same point of view as respondents: they want to know how many people favor and how many oppose an issue that they see themselves as confronting. Yet it may be neither possible nor desirable for the survey to pose exactly the question the policy maker has in mind, and in any case such a question is likely to be only one of a number of possible questions that might be asked on the issue.

2. Problems With The Referendum Point Of View

There are several reasons why answers obtained from isolated questions are usually uncertain in meaning. First, many public issues are discussed at a general level as though there is a single way of framing them and as though there are just two sides. But what is called the abortion issue, to follow our previous example, consists of a large number of different issues having to do with the reasons for abortion, the trimester involved, and so forth. Likewise, what is called ‘gun control’ can involve different types of guns and different kinds of controls. Except at the extremes, exactly which of these particular issues is posed and with what alternatives makes a considerable difference in the univariate results. Indeed, often what is reported as a conflict in findings between two surveys is due to their having asked about different aspects of the same general issue.

A second problem is that answers to survey questions always depend on the form in which the question is asked, because most respondents treat that form as a constraint on their answers. If two alternatives are given by the interviewer, most respondents will choose one, rather than offering a substitute of their own that they might prefer. For example, in one survey-based experiment the authors identified the problems spontaneously mentioned when a national sample of Americans was asked to name the most important problem facing the country. Then a parallel question was formulated for a comparable sample that included none of the four problems mentioned most often spontaneously, but instead four problems that had been mentioned by less than three percent of the population in toto, though with an invitation to respondents to substitute a different problem if they wished. Despite the invitation, the majority of respondents (60 percent) chose one of the rare problems offered explicitly, which reflected their unwillingness to go outside the frame of reference provided by the question (Schuman and Scott 1987). Evidently, the form of a question is treated by most people as setting the ‘rules of the game,’ and these rules are seldom challenged even when encouragement is offered.

It might seem as though the solution to the rules-of-the-game constraint is to keep questions ‘open’—that is, not to provide specific alternatives. This is often a good idea, but not one that is failsafe. In a related experiment on important events and changes from the recent past, ‘the development of computers’ was not mentioned spontaneously nearly as often as economic problems, but when it was included in a list of past events along with economic problems, the development of computers turned out to be the most frequent response (Schuman and Scott 1987). Apparently people asked to name an important recent event or change thought that the question referred only to political events or changes, but when the legitimacy of a different kind of response was made explicit, it was heavily selected. Thus a question can be constraining even when it is entirely open and even when the investigator is unaware of how it affects the answers respondents give.

A third reason for the limitations of univariate results is the need for comparative data to make sense in interpretation. Suppose that a sample of readers of this research paper is asked to answer a simple yes no question as to its value, and that 60 percent reply positively and 40 percent negatively. Leaving aside all the problems of question wording discussed thus far, such percentages can be interpreted only against the backdrop of other articles. If the average yes percentage for all articles is 40 percent, the author might feel proud of his success. If the average is 80 percent, the author might well hang his head in shame. We are all aware of the fundamental need for this type of comparison, yet it is easy to forget about the difficulty of interpreting absolute percentages when we feel the urge to speak definitively about public reactions to a unique event.

Finally, in addition to all of the above reasons, there are sometimes subtle features of wording that can affect answers. A classic example of a wording effect is the difference between ‘forbidding’ something and ‘not allowing’ the same thing (Rugg 1941). A number of survey experiments have shown that people are more willing to ‘not allow’ a behavior than they are to ‘forbid’ the same behavior, even though the practical effects of the distinction in wording are nil (Holleman 2000). Another subtle feature is context: for example, a question about abortion in the case of a married woman who does not want any more children is answered differently depending on whether or not it is preceded by a question about abortion in the case of a defective fetus (Schuman and Presser 1996 [1981]).

The problems of wording and context appear equally when an actual referendum is to be carried out by a government: considerable effort is made by politicians on all sides of the issue to control the wording of the question to be voted on, as well as its placement on the ballot, with the battle over these decisions sometimes becoming quite fierce. This shows that there is never a single way to phrase a referendum and that even small variations in final wording or context can influence the outcome of the voting. The same is true for survey questions, but with the crucial difference that they are meant to provide information, not to determine policy in a definitive legal sense.

The analytic approach, when combined with use of multiple questions to tap different aspects of an issue, provides the most useful perspective on survey data. Rather than focusing on the responses to individual items as such, the analysis of change over time and of variations across demographic and social background variables provides the surest route to understanding both attitudinal and factual data. Almost all important scholarly work based on surveys follows this path, giving attention to individual percentages only in passing. In addition, in recent years, classic between subjects experiments have been built into surveys, with different ways of asking a question administered to random subsamples of a larger probability sample in order to learn about the effects of question wording (Schuman and Presser 1996 [1981]). These surveybased experiments, traditionally called ‘split-ballots,’ combine the advantage of a probability sample survey to generalize to a much larger population with the advantage of randomized treatments to test causal hypotheses. Survey-based experiments have been used to investigate a variety of methodological uncertainties about question formulations, as we will see below, and are also employed increasingly to test hypotheses about substantive political and social issues.

3. Types Of Survey Questions

When investigators construct a questionnaire, they face a number of decisions about the form in which their questions are to be asked, though the decisions are often not made on the basis of much reflection. The effects of such decisions were first explored in surveybased experiments conducted in the mid-twentieth century and reported in books by Cantril (1944) and Payne (1951). In 1981 (1996), Schuman and Presser provided a systematic review of variations due to question form, along with much new experimental data. Recent books by Sudman et al. (1996), Tanur (1992), Tourangeau et al. (2000), and Krosnick and Fabrigar (forthcoming) consider many of these same issues, as well as a number of additional ones, drawing especially on ideas and research from cognitive psychology.

An initial important decision is whether to ask a question in open or closed form. Open questions, where respondents answer in their own words and these are then coded into categories, are more expensive in terms of both time and money than closed questions that present two or more alternatives that respondents choose from. Hence, open questions are not common in present-day surveys, typically being restricted to questions that attempt to capture rapid change and that are easy to code, as in standard inquiries about ‘the most important problem facing the country today.’ In this case, immediate salience is at issue and responses can usually be summarized in keyword codes such as ‘unemployment,’ ‘terrorism,’ or ‘race relations.’ An open-ended approach is also preferable when numerical answers are wanted, for example, how many hours of television a person watches a week. Schwarz (1996) has shown that offering a specific set of alternatives provides reference points that can shape answers, and thus it is probably better to leave such questions open, as recommended by Bradburn et al. (1979).

More generally, open and closed versions of a question often do lead to different univariate response distributions and to different multivariate relations as well (Schuman and Presser 1996, [1981]). Partly this is due to the tendency of survey investigators to write closed questions on the assumption that they themselves know how to frame the main choices, which can lead to their overlooking alternatives or wording especially meaningful to respondents. Many years ago Lazarsfeld (1944) proposed as a practical compromise the use of open questions in the early development of a questionnaire, with the results then drawn on to frame closed alternatives that would be more efficient for use in an actual survey. What has come to be called ‘cognitive interviewing’ takes this same notion into the laboratory by studying carefully how a small number of individuals think about the questions they are asked and about the answers they give (see several chapters in Schwarz and Sudman 1996). This may not eliminate all open closed differences, but it helps investigators learn what is most salient and meaningful to respondents.

Even after closed questions are developed, it is often instructive to include follow-up ‘why’ probes of answers in order to gain insight into how respondents perceived the questions and what they see their choices as meaning. Since it is not practical to ask such follow-ups to all respondents about all questions, Schuman (1966) recommended the use of a ‘random probe’ technique to obtain answers from a subsample of the larger sample of questions and respondents.

When the focus is on closed questions, as it often is, a number of further decisions must be made. A frequently used format is to state a series of propositions, to each of which the respondent is asked to indicate agreement or disagreement. Although this is an efficient way to proceed, there is considerable evidence that a substantial number of people, especially those with less education, show an ‘acquiescence bias’ when confronted with such statements (Krosnick and Fabrigar forthcoming). The main alternative to the agree disagree format is to require respondents to make a choice between two or more statements. Such a balanced format encourages respondents to think about the opposing alternatives, though it also requires investigators to reduce each issue to clearly opposing positions.

Another decision faced by question writers is how to handle DK (don’t know) responses. The proportion of DK answers varies not only by the type of issue—there are likely to be more to a remote foreign policy issue than to a widely discussed issue like legalization of abortion (Converse 1976–77)—but also by how much DK responses are encouraged or discouraged. At one extreme, the question may offer a DK alternative as one of the explicit choices for respondents to consider, even emphasizing the desirability of it being given if the respondent lacks adequate information on the matter. At the other extreme interviewers may be instructed to urge those who give DK responses to think further in order to provide a more substantive answer. In between, a DK response may not be mentioned by the interviewer but can be accepted when volunteered.

Which approach is chosen depends on one’s beliefs about the meaning of a DK response. Those who follow Converse’s (1964) emphasis on the lack of knowledge that the majority of people possess about many public issues tend to encourage respondents to consider a DK response as legitimate. Those who argue like Krosnick and Fabrigar forthcoming) that giving a DK response is mainly due to ‘satisficing’ prefer to press respondents to come up with a substantive choice. Still other possibilities are that DKs can involve evasion in the case of a sensitive issue (e.g., racial attitudes) and in such cases it is unclear what respondents will do if prevented from giving a DK response.

A more general methodological issue is the desirability of measuring attitude strength. Attitudes are typically defined as favorable or unfavorable evaluations of objects, but the evaluations can also be seen as varying in strength. One can strongly favor or oppose the legalization of abortion, for example, but hold an attitude toward gun registration that is weaker, or vice versa. Further, there is more than one way to measure the dimension of strength, as words like ‘extremity,’ ‘importance,’ ‘certainty,’ and ‘strength’ itself suggest, and thus far it appears that the different methods of measurement are far from perfectly correlated (Petty and Krosnick 1995). Moreover, although there is evidence that several of these strength measures are related to actual behavior, for example, donating money to one side of the dispute about the legalization of abortion, in the case of gun registration the relation has been much weaker, apparently because other social factors (i.e., the effectiveness of gun lobbying organizations) play a large role independent of attitude strength (Schuman and Presser 1996 [1981].

4. Question Wording And Bias

Every question in a survey must be conveyed in words and words are never wholly neutral and unproblematic. Words have tone, connotation, implication— which is why both a referendum and a survey are similar in always coming down to a specific way of describing an issue. Of course, sometimes a question seems to have been deliberately biased, as when a ‘survey’ mailed out by a conservative organization included the following question:

Do you believe that smut peddlers should be protected by the courts and the Congress, so they can openly sell pornographic materials to your children?

But more typical are two versions of a question that was asked during the Vietnam War:

If a situation like Vietnam were to develop in another part of the world, do you think the United States should or should not send troops [to stop a communist takeover]?

Mueller (1973) found that some 15 percent more Americans favored military action when the bracketed words were included than when they were omitted. Yet it is not entirely clear how one should regard the phrase ‘to stop a communist takeover’ during that period. Was it ‘biasing’ responses to include the phrase, or was it simply informing respondents of something they might like to have in mind as they answered? Furthermore, political leaders wishing to encourage or discourage an action can choose how to phrase policy issues, and surveys cannot ignore the force of such framing if they wish to be relevant to important political outcomes.

Another instructive example was the attempt to study attitudes during the 1982 war between Argentina and Britain over ownership of a small group of islands in the South Atlantic. It was virtually impossible to phrase a question that did not include the name of the islands, but for the Argentines they were named the Malvinas and for the British the Falkland Islands. Whichever name was used in a question could be seen as prejudicing the issue of ownership. This is an unusual example, but it shows that bias in survey questions is not a simple matter.

5. Conclusion

In this sense, we return to the fundamental paradox of survey research—the referendum point of view vs. the analytic point of view. We all wish at times to know what the public as a whole feels about an important issue—whether it involves military intervention in a distant country or a domestic issue like government support for health care. Therefore, we need to remind both ourselves and the public of the limitations of univariate survey results, while at the same time taking whatever steps we can to reduce those limitations. Above all, this means avoiding the tendency to reduce a complex issue to one or two simple closed questions, because every survey question imposes a unique perspective on responses, whether we think of this as ‘bias’ or not.

Moreover, survey data are most meaningful when they involve comparisons, especially comparisons over time and across important social groups—provided that the questions have been kept as constant in wording and meaning as possible. From a practical standpoint, probably the most useful way to see how the kinds of problems discussed here can be addressed is to read significant substantive analyses of survey data, for example, classics like Stouffer (1955) and Campbell et al. (1960), and more recent works that grapple with variability and bias, for example, Page and Shapiro (1992) and Schuman et al. (1997).


