Sequential Decision Making Research Paper

View sample Sequential Decision Making Research Paper. Browse other statistics research paper examples and check the list of research paper topics for more inspiration. If you need a religion research paper written according to all the academic standards, you can always turn to our experienced writers for help. This is how your paper can get an A! Feel free to contact our research paper writing service for professional assistance. We offer high-quality assignments for reasonable rates.

Sequential decision making describes a situation where the decision maker (DM) makes successive observations of a process before a ﬁnal decision is made, in contrast to dynamic decision making which is more concerned with controlling a process over time.

Academic Writing, Editing, Proofreading, And Problem Solving Services

Get 10% OFF with 24START discount code

Formally a sequential decision problem is deﬁned, such that the DM can take observations X₁, X₂,… one at a time. After each observation X_n the DM can decide to terminate the process and make a ﬁnal decision from a set of decisions D, or continue the process and take the next observation X_n+1. If the observations X₁, X₂,… form a random sample, the procedure is called sequential sampling.

In most sequential decision problems there is an implicit or explicit cost associated with each observation. The procedure to decide when to stop taking observations and when to continue is called the stopping rule. The objective in sequential decision making is to ﬁnd a stopping rule that optimizes the decision in terms of minimizing losses or maximizing gains including observation costs. The optimal stopping rule is also called the optimal strategy or the optimal policy.

A wide variety of sequential decision problems have been discussed in the statistics literature, including search problems, inventory problems, gambling problems, and secretary-type problems, including sampling with and without recall. Several methods have been proposed to solve the optimization problem under speciﬁed conditions, including dynamic programming, Markov chains, and Bayesian analysis.

In the psychological literature, sequential decision problems are better known as optional stopping problems. One line of research using sequential decision making is concerned with seeking information in situations such as buying houses, searching for a job candidate, price searching, or target search. The DM continues taking observations until a decision criterion for acceptance is reached. Another line of research applies sequential decision making to account for information processing in binary choice tasks, and hypothesis testing such as in signal detection tasks. The DM continues taking observations until either of two decision criteria is reached. Depending on the particular research area, observations are also called oﬀers, options, items, applicants, information, and the like. Observation costs include explicitly not only possibly money, but also time, eﬀort, aggravation, discomfort, and so on.

Contrary to the objective of statisticians or economists, psychologists are less interested in determining the optimal stopping rule, and more interested in discussing the variables that aﬀect human decision behavior in sequential decision tasks. Optimal decision strategies are considered as normative models, and their predictions are compared to actual choice behavior.

1. Sequential Decision Making With One Decision Criterion

In sequential decision making with one decision criterion the DM takes costly observations X_n, n = 1,… of a random process one at a time. After observing X_n = x_n the DM has to decide whether to continue sampling observations or to stop. In the former case, the observation X_n+1is taken at a cost of c_n+1; in the latter case the DM receives a net payoﬀ that consists of the payoﬀ minus the observation costs. The DM’s objective is to ﬁnd a stopping rule that maximizes the expected net payoﬀ.

The optimal stopping rule depends on the speciﬁc assumptions made about the situation: (a) the distribution of X is known, not known or partly known, (b) X_i are distributed identically for all i, or have the similar distribution but with diﬀerent parameters, or have diﬀerent distributions, (c) the number of possible observations, n, is bounded or unbounded, (d) the sampling procedure, e.g., it is possible to take the highest value observed so far when stopping (sampling with recall) or only to take the last value when stopping (sampling without recall), and (e) the cost function, c_n, is ﬁxed for each observation or is a function of n. Many of these problems have been studied theoretically by mathematicians and experimentally by psychologists. Pioneering experimental work was done in a series of papers by Rapoport and colleagues (1966, 1969, 1970, 1972).

1.1 Unknown Sample Distribution: Secretary-Type Problems

Kahan et al. (1967) investigated decision behavior in a sequential search task where the DM had to ﬁnd the largest of a set of n 200 numbers, observed one at a time from a deck of cards. The observations were taken in random order without replacement. The DM could only declare the current observation as the largest number (sampling without recall), and could compare the number with the previous presented numbers. No explicit cost for each observation was taken, i.e., c = 0. The sample distribution was unknown to the DM. A reward was paid only when the card with the highest number was selected, and nothing otherwise. This describes a decision situation that is known as the secretary problem (a job candidate search problem; for various other names, see, e.g., Freeman 1983) which, in its simplest form, makes explicit the following assumptions (Ferguson 1989): (a) only one position is available, (b) the number n of applicants is known, (c) applicants are interviewed sequentially in random order, each order being equally likely, and (d) all applicants can be ranked without ties—the decision to reject or accept an applicant must be based only on the relative ranks of the applicants interviewed so far, (e) an applicant once rejected cannot later be recalled, and (f ) the payoﬀ is 1 when choosing the best of the n applicants, 0 otherwise.

The optimal strategy for this kind of problem is to reject the ﬁrst s – 1, s ≥ 1, items (cards, applicants, draws) and then choose the ﬁrst item that is best in the relative ranking so far. With

the optimal strategy is to stop if a_s < 1 and to continue if a_s> 1, which can easily be determined for small n. For large n, the probability of choosing the best item is approximated by 1/e and the optimal s by n/e. (e = 2.71…). (For derivations, see, e.g., DeGroot 1970, Freeman 1983, Gilbert and Mosteller 1966).

Kahan et al. (1967) reported that about 40 percent of their subjects did not follow the optimal strategy but stopped too late and rejected a card that should have been accepted. The failure of the strategy for describing behavior was assigned to its inadequacy for the described task. Although at the beginning of the experiment the participants did not know anything about the distribution (requirement), they could learn about the distribution by taking observations (partly information). To guarantee ignorance of the distribution, Gilbert and Mosteller (1966) recommended supplying only the rank of the observation made so far and not the actual value. Seale and Rapoport (1997) conducted an experiment following this advice. They found that participants (with n = 40 and n = 80) stopped earlier than prescribed by the optimal stop- ping rule. They proposed simple decision rules or heuristics to describe the actual choice behavior. Using a cutoﬀ rule, the DM rejects the ﬁrst s – 1 applicants and then chooses the next top-ranked applicant, i.e., the candidate. The DM simply counts the number of applicants and then stops on the ﬁrst candidate after observing s 1 applicants. Under a candidate count rule, the DM counts the number of candidates and chooses the j th candidate. A successive non-candidate rule requires the DM to choose the ﬁrst candidate after observing at least k consecutive noncandidates following the last candidate.

The secretary problem has been extended and generalized in many diﬀerent directions within the mathematical statistics ﬁeld. Each of the above assumptions has been relaxed in one way or another. (Ferguson 1989, Freeman 1983). However, the label of secretary problem tends to be used only when the distribution is unknown and the decision to stop or to continue depends only on the relative ranking of the observations taken so far and not on their actual values.

1.2 Known Sample Distribution

Rapoport and Tversky (1966, 1970) investigated choice behavior when the mean and the variance of the distribution was known to the DM. The cost for each observation was ﬁxed but the amount varied across experimental conditions, and the number of possible observations n was unbounded (1966) or bounded and known (1970). Behavior for sampling with and without recall was compared. When sampling is without recall only the value of the last observation, X_n = x_n, can be received, and the payoﬀ is this value minus the total sampling cost, i.e, x_n – cn. The optimal strategy is to ﬁnd a stopping rule that maximizes the expected payoﬀ E(X_N – cN). When sampling is with recall, the highest value observed so far can be selected and the payoﬀ is max(x₁,…, x_n) – cn and the optimal strategy is to ﬁnd a stopping rule that maximizes E(max(X₁,…, X_N) – cN). In the following, v with subscripts and v* denote the expected gain from an (optimal) procedure.

1.2.1 Number Of Observations Unbounded. If n is unbounded, i.e., if the number of observations that can be taken is unlimited, and X₁, X₂… are sampled from a known distribution function F(x), the optimal strategy is the same for both sampling with and without recall. In particular, the optimal strategy is to continue to take observations whenever the observed value x_j < v*, and to stop taking observations as soon as an observed value x_j ≥ v*, where v* is the unique solution of

When the observations are taken from a standard normal distribution with density functions φ(x) and distribution function Φ(x), we have that

Although sampling with and without recall have the same solution, they seem to be diﬀerent from a psychological point of view. Rapoport and Tversky (1966) found that the group sampling without recall took signiﬁcantly fewer observations than the participants sampling with recall. The mean number of observations for both groups decreased with increasing cost c, and the diﬀerence with respect to the number of observations taken was diminished. However, the participants in both groups took fewer observations than prescribed by the optimal strategy. This nonoptimal behavior of the participants was attributed to a lack of thorough knowledge of the distributions.

1.2.2 Number Of Observations Bounded. If n, n ≥ 2, is bounded, i.e., if not more than n observations can be taken, the optimal stopping rules for sampling with and without recall are diﬀerent. For sampling without recall, an optimal procedure is to continue taking observations whenever x_j < v_n−j– c and to stop as soon as x_j ≥ v_n−j– c, where j = 1, 2… n indicates the number of observations which remain available and

With v₁ = E (X) – c, the sequence can be computed successively. Again, assuming a standard normal distribution v_j+1= φ(v_j– c) + Φ(v_j – c).

For sampling with recall, the optimal strategy is to continue the process whenever a value x_j < v* and to stop taking observations as soon as an observed value x_j ≥ v*, where v* is as in Eqn. (2), which is the same solution as for n unbounded. (For derivations of the strategies, see DeGroot 1970, Sakaguchi 1961.)

Rapoport and Tversky (1970) investigated choice behavior within this scenario. Sampling was done both with and without recall. The number of observations that could be taken as well as observation cost varied across experimental groups. One third of the participants did not follow the optimal strategy. Under both sampling procedures and all cost conditions, they took on average fewer observations than predicted by the corresponding optimal stopping rules. There were no systematic diﬀerences due to cost, as observed in their previous study. They concluded that ‘the optimal model provides a reasonable good account of the behavior of the subjects’ (p. 119).

1.3 Diﬀerent Sample Distributions For Each Observation

Most research concerned with sequential decision making assumes that the observations are sampled from the same distribution, i.e., X_iare distributed identically for all i. For many decision situations, however, the observations may be sampled from the same distribution family with diﬀerent parameters, or from diﬀerent distributions. Especially in economic areas, such as price search, it is reasonable to assume that the distributions from which observations are taken change over time. The sequence of those samples has been called nonstationary series. Of particular interest are two special nonstationary series: ascending and descending series. For ascending series, the observations are drawn from distributions, usually from normal distributions, with increasing mean as i in- creases; for descending series the mean of the distribution decreases as i increases, i indicating the sample index. For both cases, experiments have been conducted to investigate choice behavior in a changing environment. Shapira and Venezia (1981) compared choice behavior for ascending, descending and constant (identically distributed Xi) series. In one experiment (numbers from a deck of cards), the distributions were known to the DM; no explicit observation costs were imposed; sampling occurred without recall; and the number of observations that could be taken was limited to n = 7. The variance of the distributions varied across experimental groups. An optimal procedure was assumed to continue taking observations whenever x_j < v_n−j, and to stop as soon as x_j ≥ v_n−j, where j = 1, 2… n indicates the number of observations which remain available. k = 1,…, n indicates the speciﬁc distribution for the jth observation. Thus

With v₁ = E(X₁) the sequence can be computed successively. Assuming a standard normal distribution v_j+1= φ_k(v_j) + Φ_k( v_j).

Across all conditions, 58 percent of the participants behaved in an optimal way. The proportion of optimal stopping did not depend on the type of series but on the size of the variance. Nonoptimal stopping (24 percent stopped too early; 18 percent too late) de- pended on the series and on the size of the variance. In particular, participants stopped too early on ascending and too late on descending series. A similar result was observed by Brickman (1972). In this study, departing from the optimal stopping rule was attributed to an inadequacy of the stopping rule taken for the particular experimental conditions (assuming complete knowledge of the distributions). In a secretary problem design (see Sect. 1.1), Corbin et al. (1975) were less concerned with optimal choice behavior than with the processes by which the participants made their selections, and with factors that inﬂuenced those processes. The emphasis of the investigation was on decision making heuristics rather than on the adequacy of optimal models. With the same optimal stopping rule for all experimental conditions, they found that stopping behavior depended on contextual variables such as the ascending or descending trend of the inspected numbers of the stack.

2. Search Problems—Multiple Information Sources

In a sequential decision making task with multiple information sources, the DM has the option to take information sequentially from diﬀerent sources. Each information source may provide valid information with a particular probability and at diﬀerent cost. The task is not only to decide to stop or to continue the process but also, if continuing, which source of information to consult.

Early experimental studies were done by Kanarick et al. (1969), Rapoport (1969), Rapoport et al. (1972). A typical task is to ﬁnd an object (e.g., a black ball) which is hidden in one of several possible locations (e.g., in one of several bins containing white balls). The optimal search strategy depends on further task speciﬁcations, such as whether the object can move from one location to another, how many objects are to be found, and whether the search process may stop before the object has been found. Rapoport (1969) investigated the case when a single object that could not move was to be found in one of r, r ≥ 2, possible locations. The DM was not allowed to stop the process before the target was found. All of the following were known to the DM: the a priori probability p_i, p_i> 0 that the object is in location i, i = 1, 2,…, r, with ∑^r_i=1p_i = 1; a miss probability α_i, 0 < α_i < 1, that even if the object is in location i it will not be found in a particular search of that location (1 – α_i is referred to the respective detection probability); and a cost, c_i, for a single observation at location i. The objective of the DM is to ﬁnd a search strategy that minimizes the expected cost. For i = 1,…, r and j =1, 2,… let Π_ij denote the probability that the object is found at location i during the jth search and the search is terminated. Then

If all values of Π_ij /c_i for all values of i and j are arranged in order of decreasing magnitude, the optimal strategy is to search according to this ordering (for derivations, see DeGroot 1970). Ties may be ordered arbitrarily among themselves. The optimal strategy is determined by the detection probabilities and observation costs, and optimal search behavior implies a balance between maximizing the detection probability and minimizing the observation cost. Rapoport (1969) found that participants did not behave optimally. They were more concerned with maximizing the probability of detecting the target than with minimizing observation cost. Increasing the diﬀerence of observation cost c_iamong the i = 1, 2, 3, 4 locations showed that the deviation from the optimal strategy even increased. Rapoport et al. (1972) varied the search problems by allowing the DM to terminate the search at any time; adding a terminal reward, R, for ﬁnding the target; and a terminal penalty, B, for not ﬁnding the target. Most participants showed a bias toward maximizing detection probability vs. minimizing search cost per observation, similar to the previous study.

3. Sequential Decision Making With Two Or More Possible Decisions

A random sample X₁, X₂, … is generated by an unknown state of nature, Θ. The DM can take observations one at a time. After observing X_n = x_n the DM makes inferences about Θ based on the values of X_i, …, X_nand can decide whether to continue sampling observations or to stop the process. In the former case, observation X_n+1is taken; in the latter, the DM makes a ﬁnal decision d ϵ D. The consequences to the DM depend on the decision d and the value θ.

The statistical theory for this situation was developed by Wald during the 1940s. It has been used to test hypotheses and estimate parameters. In psychological research, sequential decision making of this kind is usually limited to two decisions D = {d₁, d₂}, and applied to binary choice tasks.

The standard theory of sequential analysis by Wald (1947) does not include considerations of observation costs C(n), losses for terminal decisions L(θ, d ), and a priori (subjective) probabilities π of the alternative states of nature. Deferred decision theory generalizes the original theory by including these variables explicitly. The objective of the DM is to ﬁnd a stopping rule that minimizes expected loss (called risk) and expected observation cost. The form of that optimal stopping rule depends mainly on the assumptions about the number of observations that can be taken (bounded or unbounded), and on the assumption of cost per observation (ﬁxed or not) (see DeGroot 1970). Birdsall and Roberts (1965), Edwards (1965), and Rapoport and Burkheimer (1971) introduced the idea of deferred decision theory as normative models of choice behavior to the psychological community. Experiments investigating human behavior in deferred decision tasks have been carried out by Pitz and colleagues (e.g., Pitz et al. 1969), and by Busemeyer and Rapoport (1988). Rapoport and Wallsten (1972) summarize experimental ﬁndings.

For illustration, assume the decision problem in its simplest form. Suppose two possible states of nature θ₁ or θ₂, and two possible decisions d and d . Cost c per observation is ﬁxed and the number of observations is unbounded. The DM does not know which of the states of nature, θ₁ or θ₂ is generating the observation, but there are a priori probabilities π that it is θ₁ and (1 – π) that it is θ . Let w_i denote the loss for a terminal decision incurred by the DM in deciding that θ_iis not the correct state of nature when it actually is (i =1, 2). No losses are assumed when the DM makes a correct decision. Let π_ndenote the posterior probability that Θ₁ is the correct state of nature generating the observations after n observations have been made. The total posterior expected loss is r_n = min {w₁π_n, w₂(1 – π_n) }+ nc. The DM’s objective is to minimize the expected loss. An optimal stopping rule is speciﬁed in terms of decision boundaries, α and β. If the posterior probability is greater than or equal to α, then decision d₁ is made; if the posterior probability is smaller than or equal to β, then d₂ is selected; otherwise sampling continues.

Bibliography:

Birdsall T G, Roberts R A 1965 Theory of signal detectability: Deferred decision theory. The Journal of Acoustical Society of America 37: 1064–74
Brickman P 1972 Optional stopping on ascending and descending series. Organizational Behavior and Human Performance 7: 53–62
Busemeyer J R, Rapoport A 1988 Psychological models of deferred decision making. Journal of Mathematical Psychology 32(2): 91–133
Corbin R M, Olson C L, Abbondanza M 1975 Context eﬀects in optional stopping decisions. Organizational Behavior and Human Performance 14: 207–16
De Groot M H 1970 Optimal Statistical Decisions. McGrawHill, New York
Edwards W 1965 Optimal strategies for seeking information: Models for statistics, choice response times, and human information processing. Journal of Mathematical Psychology 2: 312–29
Ferguson T S 1989 Who solved the secretary problem? Statistical Science 4(3): 282–96
Freeman P R 1983 The secretary problem and its extensions: A review. International Statistical Review 51: 189–206
Gilbert J P, Mosteller F 1966 Recognizing the maximum of a sequence. Journal of the American Statistical Association 61: 35–73
Kahan J P, Rapoport A, Jones L E 1967 Decision making in a sequential search task. Perception & Psychophysics 2(8): 374–6
Kanarick A F, Huntington J M, Peterson R C 1969 Multisource information acquisition with optional stopping. Human Factors 11: 379–85
Pitz G F, Reinhold H, Geller E S 1969 Strategies of information seeking in deferred decision making. Organizational Behavior and Human Performance 4: 1–19
Rapoport A 1969 Eﬀects of observation cost on sequential search behavior. Perception & Psychophysics 6(4): 234–40
Rapoport A, Tversky A 1966 Cost and accessibility of oﬀers as determinants of optional stopping. Psychonomic Science 4: 45–6
Rapoport A, Tversky A 1970 Choice behavior in an optional stopping task. Organizational Behavior and Human Performance 5: 105–20
Rapoport A, Burkheimer G J 1971 Models of deferred decision making. Journal of Mathematical Psychology 8: 508–38
Rapoport A, Lissitz R W, McAllister H A 1972 Search behavior with and without optional stopping. Organizational Behavior and Human Performance 7: 1–17
Rapoport A, Wallsten T S 1972 Individual decision behavior. Annual Review of Psychology 23: 131–76
Sakaguchi M 1961 Dynamic programming of some sequential sampling design. Journal of Mathematical Analysis and Applications 2: 446–66
Seale D A, Rapoport A 1997 Sequential decision making with relative ranks: An experimental investigation of the ‘secretary problem.’ Organizational Behavior and Human Decision Processes 69(3): 221–36
Shapira Z, Venezia I 1981 Optional stopping on nonstationary series. Organizational Behavior and Human Performance 27: 32–49
Wald A 1947 Sequential Analysis. Wiley, New York