Sequential Decision Making Research Paper

Academic Writing Service

View sample Sequential Decision Making Research Paper. Browse other statistics research paper examples and check the list of research paper topics for more inspiration. If you need a religion research paper written according to all the academic standards, you can always turn to our experienced writers for help. This is how your paper can get an A! Feel free to contact our research paper writing service for professional assistance. We offer high-quality assignments for reasonable rates.

Sequential decision making describes a situation where the decision maker (DM) makes successive observations of a process before a final decision is made, in contrast to dynamic decision making which is more concerned with controlling a process over time.

Academic Writing, Editing, Proofreading, And Problem Solving Services

Get 10% OFF with 24START discount code


Formally a sequential decision problem is defined, such that the DM can take observations X1, X2,… one at a time. After each observation Xn the DM can decide to terminate the process and make a final decision from a set of decisions D, or continue the process and take the next observation Xn+1. If the observations X1, X2,… form a random sample, the procedure is called sequential sampling.

In most sequential decision problems there is an implicit or explicit cost associated with each observation. The procedure to decide when to stop taking observations and when to continue is called the stopping rule. The objective in sequential decision making is to find a stopping rule that optimizes the decision in terms of minimizing losses or maximizing gains including observation costs. The optimal stopping rule is also called the optimal strategy or the optimal policy.




A wide variety of sequential decision problems have been discussed in the statistics literature, including search problems, inventory problems, gambling problems, and secretary-type problems, including sampling with and without recall. Several methods have been proposed to solve the optimization problem under specified conditions, including dynamic programming, Markov chains, and Bayesian analysis.

In the psychological literature, sequential decision problems are better known as optional stopping problems. One line of research using sequential decision making is concerned with seeking information in situations such as buying houses, searching for a job candidate, price searching, or target search. The DM continues taking observations until a decision criterion for acceptance is reached. Another line of research applies sequential decision making to account for information processing in binary choice tasks, and hypothesis testing such as in signal detection tasks. The DM continues taking observations until either of two decision criteria is reached. Depending on the particular research area, observations are also called offers, options, items, applicants, information, and the like. Observation costs include explicitly not only possibly money, but also time, effort, aggravation, discomfort, and so on.

Contrary to the objective of statisticians or economists, psychologists are less interested in determining the optimal stopping rule, and more interested in discussing the variables that affect human decision behavior in sequential decision tasks. Optimal decision strategies are considered as normative models, and their predictions are compared to actual choice behavior.

1. Sequential Decision Making With One Decision Criterion

In sequential decision making with one decision criterion the DM takes costly observations Xn, n = 1,… of a random process one at a time. After observing Xn = xn the DM has to decide whether to continue sampling observations or to stop. In the former case, the observation Xn+1 is taken at a cost of cn+1; in the latter case the DM receives a net payoff that consists of the payoff minus the observation costs. The DM’s objective is to find a stopping rule that maximizes the expected net payoff.

The optimal stopping rule depends on the specific assumptions made about the situation: (a) the distribution of X is known, not known or partly known, (b) Xi are distributed identically for all i, or have the similar distribution but with different parameters, or have different distributions, (c) the number of possible observations, n, is bounded or unbounded, (d) the sampling procedure, e.g., it is possible to take the highest value observed so far when stopping (sampling with recall) or only to take the last value when stopping (sampling without recall), and (e) the cost function, cn, is fixed for each observation or is a function of n. Many of these problems have been studied theoretically by mathematicians and experimentally by psychologists. Pioneering experimental work was done in a series of papers by Rapoport and colleagues (1966, 1969, 1970, 1972).

1.1 Unknown Sample Distribution: Secretary-Type Problems

Kahan et al. (1967) investigated decision behavior in a sequential search task where the DM had to find the largest of a set of n 200 numbers, observed one at a time from a deck of cards. The observations were taken in random order without replacement. The DM could only declare the current observation as the largest number (sampling without recall), and could compare the number with the previous presented numbers. No explicit cost for each observation was taken, i.e., c = 0. The sample distribution was unknown to the DM. A reward was paid only when the card with the highest number was selected, and nothing otherwise. This describes a decision situation that is known as the secretary problem (a job candidate search problem; for various other names, see, e.g., Freeman 1983) which, in its simplest form, makes explicit the following assumptions (Ferguson 1989): (a) only one position is available, (b) the number n of applicants is known, (c) applicants are interviewed sequentially in random order, each order being equally likely, and (d) all applicants can be ranked without ties—the decision to reject or accept an applicant must be based only on the relative ranks of the applicants interviewed so far, (e) an applicant once rejected cannot later be recalled, and (f ) the payoff is 1 when choosing the best of the n applicants, 0 otherwise.

The optimal strategy for this kind of problem is to reject the first s – 1, s ≥ 1, items (cards, applicants, draws) and then choose the first item that is best in the relative ranking so far. With

Sequential Decision Making Research Paper Formula 1

the optimal strategy is to stop if as < 1 and to continue if as > 1, which can easily be determined for small n. For large n, the probability of choosing the best item is approximated by 1/e and the optimal s by n/e. (e = 2.71…). (For derivations, see, e.g., DeGroot 1970, Freeman 1983, Gilbert and Mosteller 1966).

Kahan et al. (1967) reported that about 40 percent of their subjects did not follow the optimal strategy but stopped too late and rejected a card that should have been accepted. The failure of the strategy for describing behavior was assigned to its inadequacy for the described task. Although at the beginning of the experiment the participants did not know anything about the distribution (requirement), they could learn about the distribution by taking observations (partly information). To guarantee ignorance of the distribution, Gilbert and Mosteller (1966) recommended supplying only the rank of the observation made so far and not the actual value. Seale and Rapoport (1997) conducted an experiment following this advice. They found that participants (with n = 40 and n = 80) stopped earlier than prescribed by the optimal stop- ping rule. They proposed simple decision rules or heuristics to describe the actual choice behavior. Using a cutoff rule, the DM rejects the first s – 1 applicants and then chooses the next top-ranked applicant, i.e., the candidate. The DM simply counts the number of applicants and then stops on the first candidate after observing s 1 applicants. Under a candidate count rule, the DM counts the number of candidates and chooses the j th candidate. A successive non-candidate rule requires the DM to choose the first candidate after observing at least k consecutive noncandidates following the last candidate.

The secretary problem has been extended and generalized in many different directions within the mathematical statistics field. Each of the above assumptions has been relaxed in one way or another. (Ferguson 1989, Freeman 1983). However, the label of secretary problem tends to be used only when the distribution is unknown and the decision to stop or to continue depends only on the relative ranking of the observations taken so far and not on their actual values.

1.2 Known Sample Distribution

Rapoport and Tversky (1966, 1970) investigated choice behavior when the mean and the variance of the distribution was known to the DM. The cost for each observation was fixed but the amount varied across experimental conditions, and the number of possible observations n was unbounded (1966) or bounded and known (1970). Behavior for sampling with and without recall was compared. When sampling is without recall only the value of the last observation, Xn = xn, can be received, and the payoff is this value minus the total sampling cost, i.e, xn – cn. The optimal strategy is to find a stopping rule that maximizes the expected payoff E(XN – cN). When sampling is with recall, the highest value observed so far can be selected and the payoff is max(x1,…, xn) – cn and the optimal strategy is to find a stopping rule that maximizes E(max(X1,…, XN) – cN). In the following, v with subscripts and v* denote the expected gain from an (optimal) procedure.

1.2.1 Number Of Observations Unbounded. If n is unbounded, i.e., if the number of observations that can be taken is unlimited, and X1, X2… are sampled from a known distribution function F(x), the optimal strategy is the same for both sampling with and without recall. In particular, the optimal strategy is to continue to take observations whenever the observed value xj < v*, and to stop taking observations as soon as an observed value xj ≥ v*, where v* is the unique solution of

Sequential Decision Making Research Paper Formula 2

When the observations are taken from a standard normal distribution with density functions φ(x) and distribution function Φ(x), we have that

Sequential Decision Making Research Paper Formula 3

Although sampling with and without recall have the same solution, they seem to be different from a psychological point of view. Rapoport and Tversky (1966) found that the group sampling without recall took significantly fewer observations than the participants sampling with recall. The mean number of observations for both groups decreased with increasing cost c, and the difference with respect to the number of observations taken was diminished. However, the participants in both groups took fewer observations than prescribed by the optimal strategy. This nonoptimal behavior of the participants was attributed to a lack of thorough knowledge of the distributions.

1.2.2 Number Of Observations Bounded. If n, n ≥ 2, is bounded, i.e., if not more than n observations can be taken, the optimal stopping rules for sampling with and without recall are different. For sampling without recall, an optimal procedure is to continue taking observations whenever xj < vn−j– c and to stop as soon as xj ≥ vn−j – c, where j = 1, 2… n indicates the number of observations which remain available and

Sequential Decision Making Research Paper Formula 4

With v1 = E (X) – c, the sequence can be computed successively. Again, assuming a standard normal distribution vj+1 = φ(vj – c) + Φ(vj – c).

For sampling with recall, the optimal strategy is to continue the process whenever a value xj < v* and to stop taking observations as soon as an observed value xj ≥ v*, where v* is as in Eqn. (2), which is the same solution as for n unbounded. (For derivations of the strategies, see DeGroot 1970, Sakaguchi 1961.)

Rapoport and Tversky (1970) investigated choice behavior within this scenario. Sampling was done both with and without recall. The number of observations that could be taken as well as observation cost varied across experimental groups. One third of the participants did not follow the optimal strategy. Under both sampling procedures and all cost conditions, they took on average fewer observations than predicted by the corresponding optimal stopping rules. There were no systematic differences due to cost, as observed in their previous study. They concluded that ‘the optimal model provides a reasonable good account of the behavior of the subjects’ (p. 119).

1.3 Different Sample Distributions For Each Observation

Most research concerned with sequential decision making assumes that the observations are sampled from the same distribution, i.e., Xi are distributed identically for all i. For many decision situations, however, the observations may be sampled from the same distribution family with different parameters, or from different distributions. Especially in economic areas, such as price search, it is reasonable to assume that the distributions from which observations are taken change over time. The sequence of those samples has been called nonstationary series. Of particular interest are two special nonstationary series: ascending and descending series. For ascending series, the observations are drawn from distributions, usually from normal distributions, with increasing mean as i in- creases; for descending series the mean of the distribution decreases as i increases, i indicating the sample index. For both cases, experiments have been conducted to investigate choice behavior in a changing environment. Shapira and Venezia (1981) compared choice behavior for ascending, descending and constant (identically distributed Xi) series. In one experiment (numbers from a deck of cards), the distributions were known to the DM; no explicit observation costs were imposed; sampling occurred without recall; and the number of observations that could be taken was limited to n = 7. The variance of the distributions varied across experimental groups. An optimal procedure was assumed to continue taking observations whenever xj < vn−j, and to stop as soon as xj ≥ vn−j, where j = 1, 2… n indicates the number of observations which remain available. k = 1,…, n indicates the specific distribution for the jth observation. Thus

Sequential Decision Making Research Paper Formula 5

With v1 = E(X1) the sequence can be computed successively. Assuming a standard normal distribution vj+1 = φk(vj) + Φk( vj).

Across all conditions, 58 percent of the participants behaved in an optimal way. The proportion of optimal stopping did not depend on the type of series but on the size of the variance. Nonoptimal stopping (24 percent stopped too early; 18 percent too late) de- pended on the series and on the size of the variance. In particular, participants stopped too early on ascending and too late on descending series. A similar result was observed by Brickman (1972). In this study, departing from the optimal stopping rule was attributed to an inadequacy of the stopping rule taken for the particular experimental conditions (assuming complete knowledge of the distributions). In a secretary problem design (see Sect. 1.1), Corbin et al. (1975) were less concerned with optimal choice behavior than with the processes by which the participants made their selections, and with factors that influenced those processes. The emphasis of the investigation was on decision making heuristics rather than on the adequacy of optimal models. With the same optimal stopping rule for all experimental conditions, they found that stopping behavior depended on contextual variables such as the ascending or descending trend of the inspected numbers of the stack.

2. Search Problems—Multiple Information Sources

In a sequential decision making task with multiple information sources, the DM has the option to take information sequentially from different sources. Each information source may provide valid information with a particular probability and at different cost. The task is not only to decide to stop or to continue the process but also, if continuing, which source of information to consult.

Early experimental studies were done by Kanarick et al. (1969), Rapoport (1969), Rapoport et al. (1972). A typical task is to find an object (e.g., a black ball) which is hidden in one of several possible locations (e.g., in one of several bins containing white balls). The optimal search strategy depends on further task specifications, such as whether the object can move from one location to another, how many objects are to be found, and whether the search process may stop before the object has been found. Rapoport (1969) investigated the case when a single object that could not move was to be found in one of r, r ≥ 2, possible locations. The DM was not allowed to stop the process before the target was found. All of the following were known to the DM: the a priori probability pi, pi > 0 that the object is in location i, i = 1, 2,…, r, with ∑ri=1 pi = 1; a miss probability αi, 0 < αi < 1, that even if the object is in location i it will not be found in a particular search of that location (1 – αi is referred to the respective detection probability); and a cost, ci, for a single observation at location i. The objective of the DM is to find a search strategy that minimizes the expected cost. For i = 1,…, r and j =1, 2,… let Πij denote the probability that the object is found at location i during the jth search and the search is terminated. Then

Sequential Decision Making Research Paper Formula 6

If all values of Πij /ci for all values of i and j are arranged in order of decreasing magnitude, the optimal strategy is to search according to this ordering (for derivations, see DeGroot 1970). Ties may be ordered arbitrarily among themselves. The optimal strategy is determined by the detection probabilities and observation costs, and optimal search behavior implies a balance between maximizing the detection probability and minimizing the observation cost. Rapoport (1969) found that participants did not behave optimally. They were more concerned with maximizing the probability of detecting the target than with minimizing observation cost. Increasing the difference of observation cost ci among the i = 1, 2, 3, 4 locations showed that the deviation from the optimal strategy even increased. Rapoport et al. (1972) varied the search problems by allowing the DM to terminate the search at any time; adding a terminal reward, R, for finding the target; and a terminal penalty, B, for not finding the target. Most participants showed a bias toward maximizing detection probability vs. minimizing search cost per observation, similar to the previous study.

3. Sequential Decision Making With Two Or More Possible Decisions

A random sample X1, X2, … is generated by an unknown state of nature, Θ. The DM can take observations one at a time. After observing Xn = xn the DM makes inferences about Θ based on the values of Xi, …, Xn and can decide whether to continue sampling observations or to stop the process. In the former case, observation Xn+1 is taken; in the latter, the DM makes a final decision d ϵ D. The consequences to the DM depend on the decision d and the value θ.

The statistical theory for this situation was developed by Wald during the 1940s. It has been used to test hypotheses and estimate parameters. In psychological research, sequential decision making of this kind is usually limited to two decisions D = {d1, d2}, and applied to binary choice tasks.

The standard theory of sequential analysis by Wald (1947) does not include considerations of observation costs C(n), losses for terminal decisions L(θ, d ), and a priori (subjective) probabilities π of the alternative states of nature. Deferred decision theory generalizes the original theory by including these variables explicitly. The objective of the DM is to find a stopping rule that minimizes expected loss (called risk) and expected observation cost. The form of that optimal stopping rule depends mainly on the assumptions about the number of observations that can be taken (bounded or unbounded), and on the assumption of cost per observation (fixed or not) (see DeGroot 1970). Birdsall and Roberts (1965), Edwards (1965), and Rapoport and Burkheimer (1971) introduced the idea of deferred decision theory as normative models of choice behavior to the psychological community. Experiments investigating human behavior in deferred decision tasks have been carried out by Pitz and colleagues (e.g., Pitz et al. 1969), and by Busemeyer and Rapoport (1988). Rapoport and Wallsten (1972) summarize experimental findings.

For illustration, assume the decision problem in its simplest form. Suppose two possible states of nature θ1 or θ2, and two possible decisions d and d . Cost c per observation is fixed and the number of observations is unbounded. The DM does not know which of the states of nature, θ1 or θ2 is generating the observation, but there are a priori probabilities π that it is θ1 and (1 – π) that it is θ . Let wi denote the loss for a terminal decision incurred by the DM in deciding that θi is not the correct state of nature when it actually is (i =1, 2). No losses are assumed when the DM makes a correct decision. Let πn denote the posterior probability that Θ1 is the correct state of nature generating the observations after n observations have been made. The total posterior expected loss is rn = min {w1πn, w2(1 – πn) }+ nc. The DM’s objective is to minimize the expected loss. An optimal stopping rule is specified in terms of decision boundaries, α and β. If the posterior probability is greater than or equal to α, then decision d1 is made; if the posterior probability is smaller than or equal to β, then d2 is selected; otherwise sampling continues.

Bibliography:

  1. Birdsall T G, Roberts R A 1965 Theory of signal detectability: Deferred decision theory. The Journal of Acoustical Society of America 37: 1064–74
  2. Brickman P 1972 Optional stopping on ascending and descending series. Organizational Behavior and Human Performance 7: 53–62
  3. Busemeyer J R, Rapoport A 1988 Psychological models of deferred decision making. Journal of Mathematical Psychology 32(2): 91–133
  4. Corbin R M, Olson C L, Abbondanza M 1975 Context effects in optional stopping decisions. Organizational Behavior and Human Performance 14: 207–16
  5. De Groot M H 1970 Optimal Statistical Decisions. McGrawHill, New York
  6. Edwards W 1965 Optimal strategies for seeking information: Models for statistics, choice response times, and human information processing. Journal of Mathematical Psychology 2: 312–29
  7. Ferguson T S 1989 Who solved the secretary problem? Statistical Science 4(3): 282–96
  8. Freeman P R 1983 The secretary problem and its extensions: A review. International Statistical Review 51: 189–206
  9. Gilbert J P, Mosteller F 1966 Recognizing the maximum of a sequence. Journal of the American Statistical Association 61: 35–73
  10. Kahan J P, Rapoport A, Jones L E 1967 Decision making in a sequential search task. Perception & Psychophysics 2(8): 374–6
  11. Kanarick A F, Huntington J M, Peterson R C 1969 Multisource information acquisition with optional stopping. Human Factors 11: 379–85
  12. Pitz G F, Reinhold H, Geller E S 1969 Strategies of information seeking in deferred decision making. Organizational Behavior and Human Performance 4: 1–19
  13. Rapoport A 1969 Effects of observation cost on sequential search behavior. Perception & Psychophysics 6(4): 234–40
  14. Rapoport A, Tversky A 1966 Cost and accessibility of offers as determinants of optional stopping. Psychonomic Science 4: 45–6
  15. Rapoport A, Tversky A 1970 Choice behavior in an optional stopping task. Organizational Behavior and Human Performance 5: 105–20
  16. Rapoport A, Burkheimer G J 1971 Models of deferred decision making. Journal of Mathematical Psychology 8: 508–38
  17. Rapoport A, Lissitz R W, McAllister H A 1972 Search behavior with and without optional stopping. Organizational Behavior and Human Performance 7: 1–17
  18. Rapoport A, Wallsten T S 1972 Individual decision behavior. Annual Review of Psychology 23: 131–76
  19. Sakaguchi M 1961 Dynamic programming of some sequential sampling design. Journal of Mathematical Analysis and Applications 2: 446–66
  20. Seale D A, Rapoport A 1997 Sequential decision making with relative ranks: An experimental investigation of the ‘secretary problem.’ Organizational Behavior and Human Decision Processes 69(3): 221–36
  21. Shapira Z, Venezia I 1981 Optional stopping on nonstationary series. Organizational Behavior and Human Performance 27: 32–49
  22. Wald A 1947 Sequential Analysis. Wiley, New York
Sequential Statistical Methods Research Paper
Semiparametric Models Research Paper

ORDER HIGH QUALITY CUSTOM PAPER


Always on-time

Plagiarism-Free

100% Confidentiality
Special offer! Get 10% off with the 24START discount code!