Conditioning and Learning Research Paper

Sample Conditioning and Learning Research Paper. Browse other research paper examples and check the list of research paper topics for more inspiration. iResearchNet offers academic assignment help for students all over the world: writing from scratch, editing, proofreading, problem solving, from essays to dissertations, from humanities to STEM. We offer full confidentiality, safe payment, originality, and money-back guarantee. Secure your academic success with our risk-free services.

Earth’s many microenvironments change over time, often creating conditions less hospitable to current life-forms than conditions that existed prior to the change. Initially, lifeforms adjusted to these changes through the mechanisms now collectively called evolution. Importantly, evolution improves a life-form’s functionality (i.e., so-called biological fitness as measured in terms of reproductive success) in the environment across generations. It does nothing directly to enhance an organism’s fit to the environment within the organism’s life span. However, animals did evolve a mechanism to improve their fit to the environment within each animal’s life span. Specifically, animals have evolved the potential to change their behavior as a function of experienced relationships among events, with eventshere referring to both events under the control of the animal (i.e., responses) and events not under the direct control of the animal (i.e., stimuli). Changing one’s behavior as a function of prior experience is what we mean by conditioning and learning (used here synonymously). The observed behavioral changes frequently are seemingly preparatory for an impending, often biologically significant event that is contingent upon immediately preceding stimuli, and sometimes the behavioral changes serve to modify the impending event in an adaptive way.

Academic Writing, Editing, Proofreading, And Problem Solving Services

Get 10% OFF with 24START discount code

In principle, there are many possible sets of rules by which an organism might modify its behavior to increase its biological fitness (preparing for and modifying impending events) as a result of prior exposure to specific event contingencies. However, organisms use only a few of these sets of rules; these constitute what we call biological intelligence. Here we summarize, at the psychological level, the basic principles of elementary biological intelligence: conditioning and elementary learning. At the level of the basic learning described here, research has identified a set of rules (laws) that appear to apply quite broadly across many species, including humans. Moreover, within subjects these laws appear to apply, with only adjustments of parameters being required, across motivational systems and tasks (e.g., Domjan, 1983; Logue, 1979). Obviously, as we look at more complex behavior, species and task differences have greater influence, which seemingly reflects the differing parameters previously mentioned interacting with one another. For example, humans as well as dogs readily exhibit conditioned salivation or conditioned fear, whereas social interactions are far more difficult to describe through a general set of laws.

Learning is the intervening process that mediates between an environmental experience and a change in the behavior of the organism. More precisely, learning is ordinarily defined as a relatively permanent change in a subject’s response potential, resulting from experience, that is specific to the presence of stimuli similar to those from that experience, and cannot be attributed entirely to changes in receptors or effectors. Notably, the term response potential allows for learning that is not necessarily immediately expressed in behavior (i.e., latent learning), and the requirement that a stimulus from the experience be present speaks to learning being stimulus specific as opposed to a global change in behavior. Presumably, more complex changes in behavior are built from a constellation of such elementary learned relationships (hereafter called associations).

Interest in the analysis of basic learning began a century ago with its roots in several different controversies. Among these was the schism between empiricism, represented by the British empiricist philosophers, Hume and J. S. Mill, and rationalism, represented by Descartes and Kant. The empiricists assumed that knowledge about the world was acquired through interaction with events in the world, whereas rationalists argued that knowledge was inborn (at least in humans) and experience merely helped us organize and express that knowledge. Studies of learning were performed in part to determine the degree to which beliefs about the world could be modified by experience. Surely demonstrations of behavioral plasticity as a function of experience were overtly more compatible with the empiricist view, but the rationalist position never denied that experience influenced knowledge and the behavior. It simply held that knowledge arose within the organism, rather than directly from the experiencing of events. Today, this controversy (reflected in more modern terms as the nature vs. nurture debate) has faded due to the realization that experience provides the content of knowledge about the world, but extracting relationships between events from experience requires a nervous system that is predisposed to extract these relationships. Predispositions to identify relationships between events, although strongly modulated during development by experience, are surely influenced by genetic composition. Hence, acquired knowledge, as revealed through a change in behavior, undoubtedly reflects an interaction of genes (rationalism-nature) and experience (empiricism-nurture).

The second controversy that motivated studies of learning was a desire to understand whether acquired thought and behavior could better be characterized by mechanism, which left the organism as a vessel in which simple laws of learning operated, or by mentalism, which often attributed to the organism some sort of conscious control of its thought and behavior. The experimental study of learning that began in the early twentieth century was partly in reaction to the mentalism implicit in the introspective approach to psychology that prevailed at that time (Watson, 1913). Mechanism was widely accepted as providing a compelling account of simple reflexes. The question was whether it also sufficed to account for behaviors that were more complex and seemingly volitional. Mechanism has been attacked for ignoring the (arguably obvious) active role of the organism in determining its behavior, whereas mentalism has been attacked for passing the problem of explaining behavior to a so-called homunculus. Mentalism starts out with a strong advantage in this dispute because human society, culture, and religion are all predicated on people’s being free agents who are able to determine and control their behavior. In contrast, most theoretical accounts of learning (see Tolman, e.g., 1932, as an exception) are mechanistic and try to account for acquired behavior uniquely in terms of (a) past experience, which is encoded in neural representations; (b) present stimulation; and (c) genetic predispositions (today at least), notably excluding any role for free will. To some degree, the mechanism-mentalism controversy has been confounded with levels of analysis, with mechanistic accounts of learning tending to be more molecular. Obviously, different levels of analysis may be complementary rather than contradictory.

The third controversy that stimulated interest in learning was the relationship of humans to other species. Human culture and religion has traditionally treated humans as superior to animals on many dimensions. At the end of the nineteenth century, however, acceptance of Darwin’s theory of evolution by natural selection challenged the uniqueness of humans. Defenders of tradition looked at learning capacity as a demonstration of the superiority of humans over animals, whereas Darwinians looked to basic learning to demonstrate continuity across species. A century of research has taught us that, although species do differ appreciably in behavioral plasticity, with parametric adjustment a common set of laws of learning appears to apply across at least all warmblooded animals (Domjan, 1983). Moreover, these parametric adjustments do not always reflect a greater learning capacity in humans than in other species. As a result of evolution in concert with species-specific experience during maturation, each species is adept at dealing with the tasks that the environment commonly presents to that particular species in its ecological niche. For example, Clark’s nutcrackers (birds that cache food) are able to remember where they have stored thousands of edible items (Kamil & Clements, 1990), a performance that humans would be hard-pressed to match.

The fourth factor that stimulated an interest in the study of basic learning was a practical one. Researchers such as Thorndike (1949) and Guthrie (1938) were particularly concerned with identifying principles that might be applied in our schools and toward other needs of our society. Surely this goal has been fulfilled at least in part, as can be seen for example in contemporary use of effective procedures for behavior modification.

Obviously, the human-versus-animal question (third factor listed) required that nonhuman animals be studied, but the other questions in principle did not. However, animal subjects were widely favored for two reasons. First, the behavior of nonhuman subjects was assumed by some researchers to be governed by the same basic laws that apply to human behavior, but in a simpler form which made them more readily observable. Although many researchers today accept the assumption of evolutionary continuity, research has demonstrated that the behavior of nonhumans is sometimes far from simple. The second reason for studying learning in animals has fared better. When seeking general laws of learning that obtain across individuals, individual differences can be an undesirable source of noise in one’s data. Animals permit better control of irrelevant differences in genes and prior experience, thereby reducing individual differences, than is ethically or practically possible with humans.

The study of learning in animals within simple Pavlovian situations (stimulus-stimulus learning) had many parallels with the study of simple associative learning in humans that was prevalent from the 1880s to the 1960s. The so-called cognitive revolution that began in the 1960s largely ended such research with humans and caused the study of basic learning in animals to be viewed by some as irrelevant to our understanding of human learning. The cognitive revolution was driven largely by (a) a shift from trying to illuminate behavior with the assistance of hypothetical mental processes, to trying to understand mental processes through the study of behavior, and (b) the view that the simple tasks that were being studied until that time told us little about learning and memory in the real world (i.e., lacked ecological validity). However, many of today’s cognitive psychologists often return to the constructs that were initially developed before the advent of the field now called cognitive psychology (e.g., McClelland, 1988). Of course, issues of ecological validity are not to be dismissed lightly. The real question is whether complex behavior in natural situations can better be understood by reducing the behavior into components that obey the laws of basic learning, or whether a more molar approach will be more successful. Science would probably best be served by our pursuing both approaches. Clearly, the approach of this research paper is reductionist. Representative of the potential successes that might be achieved through application of the laws of basic learning, originally identified in the confines of the sterile laboratory, are a number of quasinaturalistic studies of seemingly functional behaviors. Some examples are provided by Domjan’s studies of how Pavlovian conditioning improves the reproductive success of Japanese quail (reviewed in Domjan & Hollis, 1988), Kamil’s studies of how the laws of learning facilitate the feeding systems of different species of birds (reviewed in Kamil, 1983), and Timberlake’s studies of how different components of rats’behavior, each governed by general laws of learning, are organized to yield functional feeding behavior in quasi-naturalistic settings (reviewed in Timberlake & Lucas, 1989).

Although this research paper focuses on the content of learning and the conditions that favor its occurrence and expression rather than the function of learning, it is important to emphasize that the capacity for learning evolved because it enhances an animal’s biological fitness (reviewed in Shettleworth, 1998). The vast majority of instances of learning are clearly functional. However, there are many documented cases in which specific instances of learned behavior are detrimental to the well-being of an organism (e.g., Breland & Breland, 1961; Gwinn, 1949). Typically, these instances arise in situations with contingencies contrary to those prevailing in the animal’s natural habitat or inconsistent with its past experience (see this research paper’s section entitled “Predispositions: Genetic and Experiential”). An increased understanding of when learning will result in dysfunctional behavior is currently contributing to contemporary efforts to design improved forms of behavior therapy.

This research paper selectively reviews research on both Pavlovian (i.e., stimulus-stimulus) and instrumental (response-stimulus) learning. In many respects, an organism’s response may be functionally similar to a discrete stimulus, as demonstrated by the fact that most phenomena identified in Pavlovian conditioning have instrumental counterparts. However, one important difference is that Pavlovian research has generally studied qualitative relationships (e.g., whether the frequency or magnitude of an acquired response increases or decreases with a specific treatment). In contrast, much instrumental research has sought quantitative relations between the frequency of aresponse and its (prior) environmental consequences. Readers interested in the preparations that have traditionally been used to study acquired behavior should consult Hearst’s (1988) excellent review, which in many ways complements this research paper.

Empirical Laws Of Pavlovian Responding

Given appropriate experience, a stimulus will come to elicit behavior that is not characteristic of responding to that stimulus, but is characteristic for a second stimulus (hereafter called an outcome). For example, in Pavlov’s (1927) classic studies, dogs salivated at the sound of a bell if previously the bell had been rung before food was presented. That is, the bell acquired stimulus control over the dogs’ salivation. Here we summarize the relationships between stimuli that promote such acquired responding, although we begin with changes in behavior that occur to a single stimulus.

Single-Stimulus Phenomena

The simplest type of learning is that which results from exposure to a single stimulus. For example, if you hear a loud noise, you are apt to startle. But if that noise is presented repeatedly, the startle reaction will gradually decrease, a process called habituation. Occasionally, responding may increase with repeated presentations of a stimulus, a phenomenon called sensitization. Habituation is far more common than sensitization, with sensitization ordinarily being observed only with very intense stimuli. Habituation is regarded as a primitive form of learning, and is sometimes studied explicitly because researchers thought that its simplicity might allow the essence of the learning process to be observed more readily than in situations involving multiple stimuli. Consistent with this view, habituation exhibits many of the same characteristics of learning seen with multiple stimuli (Thompson & Spencer, 1966). These include (a) decelerating acquisition per trial over increasing numbers of trials; (b) a so-called spontaneous loss of habituation over increasing retention intervals; (c) more rapid reacquisition of habituation over repeated series of habituation trials; (d) slower habituation over trials if the trials are spaced, but slower spontaneous loss of habituation thereafter (rate sensitivity); (e) further habituation trials after behavioral change over trials has ceased retard spontaneous loss from habituation (i.e., overtraining results in some sort of initially latent learning); (f) generalization to other stimuli in direct relation to the similarity of the habituated stimulus to the test stimulus; and (g) temporary masking by an intense stimulus (i.e., strong responding to a habituated stimulus is observed if the stimulus is presented immediately following presentation of an intense novel stimulus). As we shall see, these phenomena are shared with learning involving multiple events.

Traditionally, sensitization was viewed as simply the opposite of habituation. But as noted by Groves and Thompson (1970), habituation is highly stimulus-specific, whereas sensitization is not. Stimulus specificity is not an all-or-none matter; however, sensitization clearly generalizes more broadly to relatively dissimilar stimuli than does habituation. Because of this difference in stimulus specificity and because different neural pathways are apparently involved, Groves and Thompson suggested that habituation and sensitization are independent processes that summate for any test stimulus. Habituation is commonly viewed as nonassociative. However, Wagner (1978) has suggested that long-term habituation (that which survives long retention intervals) is due to an association between the habituated stimulus and the context in which habituation occurred (but see Marlin & Miller, 1981).

Phenomena Involving Two Stimuli: Single Cue–Single Outcome

Factors Influencing Acquired Stimulus Control of Behavior

Stimulus Salience and Attention. The rate at which stimulus control by a conditioned stimulus (CS) is achieved (in terms of number of trials) and the asymptote of control attained are both positively related to the salience of both the CS and the outcome (e.g., Kamin, 1965). Salience here refers to a composite of stimulus intensity, size, contrast with background, motion, and stimulus change, among other factors. Salience is not only a function of the physical stimulus, but also a function of the state of the subject (e.g., food is more salient to a hungry than a sated person). Ordinarily, the salience of a cue has greater influence on the rate at which stimulus control of behavior develops (as a function of number of training trials), whereas the salience of the outcome has greater influence on the ultimate level of stimulus control that is reached over many trials. Clearly, the hybrid construct of salience as used here has much in common with what is commonly called attention, but we avoid that construct because of its additional implications. Stimulus salience is not only important during training; conditioned responding is directly influenced by the salience of the test stimulus, a point long ago noted by Hull (1952).

Predispositions: Genetic and Experiential. The construct of salience speaks to the ease with which a cue will come to control behavior, but it does not take into account the nature of the outcome. In fact, some stimuli more readily become cues for a specific outcome than do other stimuli. For example, Garcia and Koelling (1966) gave thirsty rats access to flavored water that was accompanied by sound and light stimuli whenever they drank. For half the animals, drinking was immediately followed with foot shock, and for the other half it was followed by an agent that induced gastric distress. Although all subjects received the same audiovisual-plusflavor compound stimulus, the subjects that received the foot shock later exhibited greater avoidance of the audiovisual cues, whereas the subjects that received the gastric distress exhibited greater avoidance of the flavor. These observations cannot be explained in terms of the relative salience of the cues. Although Garcia and Koelling interpreted this cueto-consequence effect in terms of genetic predispositions reflecting the importance of flavor cues with respect to gastric consequences and audiovisual cues with respect to cutaneous consequences, later research suggests that pretraining experience interacts with genetic factors in creating predispositions that allow stimulus control to develop for some stimulus dyads more readily than for others. For example, Dalrymple and Galef (1981) found that rats forced to make a visual discrimination for food were more apt to associate visual cues with an internal malaise.

Spatiotemporal Contiguity (Similarity). Stimulus control of acquired behavior is a strong direct function of the proximity of a potential Pavlovian cue to an outcome in space (Rescorla & Cunningham, 1979) and time (Pavlov, 1927). Contiguity is so powerful that some researchers have suggested that it is the only nontrivial determinant of stimulus control (e.g., Estes, 1950; Guthrie, 1935). However, several conditioning phenomena appear to violate the so-called law of contiguity. One long-standing challenge arises from the observation that simultaneous presentation of a cue and outcome results in weaker conditioned responding to the cue than when the cue slightly precedes the outcome. However, this simultaneous conditioning deficit has now been recognized as reflecting a failure to express information acquired during simultaneous pairings rather than a failure to encode the simultaneous relationship (i.e., most conditioned responses are anticipatory of an outcome, and are temporally inappropriate for a cue that signals that the outcome is already present). For example, Matzel, Held, and Miller (1988) demonstrated that simultaneous pairings do in fact result in robust learning, but that this information is behaviorally expressed only if an assessment procedure sensitive to simultaneous pairings is used.

A second challenge to the law of contiguity has been based on the observation that conditioned taste aversions yield stimulus control even when cues (flavors) and outcome (internal malaise) are separated by hours (Garcia, Ervin, & Koelling, 1966). However, even with conditioned taste aversions, stimulus control (i.e., aversion to the flavor) decreases as the interval between the flavor and internal malaise increases. All that differs here from other conditioning preparations is the rate of decrease in stimulus control as the interstimulus interval in training increases. Thus, conditioned taste aversion is merely a parametric variation of the law of contiguity, not a violation of it.

Another challenge to the law of contiguity that is not so readily dismissed is based on the observation that the effect of interstimulus interval is often inversely related to the average interval between outcomes (e.g., an increase in the CS-US interval has less of a decremental effect on conditioned responding if the intertrial interval is correspondingly increased). That is, stimulus control appears to depend not so much on the absolute interval between a cue and an outcome (i.e., absolute temporal contiguity) as on the ratio of this interval to that between outcomes (i.e., relative contiguity; e.g., Gibbon, Baldock, Locurto, Gold, & Terrace, 1977). A further challenge to the law of contiguity is discussed in this research paper’s section entitled “Mediation.”

According to the British empiricist philosophers, associations between elements were more readily formed when the elements were similar (Berkeley, 1710/1946). More recently, well-controlled experiments have confirmed that development of stimulus control is facilitated if paired cues and outcome are made more similar (e.g., Rescorla & Furrow, 1977). The neural representations of paired stimuli seemingly include many attributes of the stimuli, including their temporal and spatial relationships. This is evident in conditioned responding reflecting not only an expectation of a specific outcome, but the outcome occurring at a specific time and place (e.g., Saint Paul, 1982; Savastano & Miller, 1998). If temporal and spatial coordinates are viewed as stimulus attributes, contiguity can be viewed as similarity on the temporal and spatial dimensions, thereby subsuming spatiotemporal contiguity within a general conception of similarity. Thus, the law of similarity appears able to encompass the law of contiguity.

Objective Contingency. When a cue is consistently followed by an outcome and these pairings are punctuated by intertrial intervals in which neither the cue nor the outcome occurs, stimulus control of behavior ordinarily develops over trials. However, when cues or outcomes sometimes occur by themselves during the training sessions, conditioned responding to the cue (reflecting the outcome) is often slower to develop (measured in number of cue-outcome pairings) and is asymptotically weaker (Rescorla, 1968).

There are four possibilities for each trial in which a dichotomous cue or outcome might be presented, as shown in Figure 13.1:

Cue–outcome.
Cue–no outcome.
No cue–outcome.
No cue–no outcome.

The frequencies of trials of type 1, 2, 3, and 4 are a, b, c, and d, respectively. The objective contingency is usually defined in terms of the difference in conditional probabilities of the outcome in the presence (a/[a + b]) and in the absence (c/[c + d]) of the cue. If the conditional probability of the outcome is greater in the presence rather than absence of the cue, the contingency is positive; conversely, if the conditional probability of the outcome is less in the presence than absence of the cue, the contingency is negative. Alternatively stated, contingency increases with the occurrence of a- and d-type trials and decreases with b- and c-type trials. In terms of stimulus control, excitatory responding is observed to increase and behavior indicative of conditioned inhibition (see this research paper’s later section on that topic) is seen to decrease with increasing contingency, and vice versa with decreasing contingency. Empirically, the four types of trials are seen to have unequal influence on stimulus control, with Type 1 trials having the greatest impact and Type 4 trials having the least impact (e.g., Wasserman, Elek, Chatlosh, & Baker, 1993). Note that although we previously described the effect of spaced versus massed cue-outcome pairings as a qualifier of contiguity, such trial spacing effects are readily subsumed under objective contingency because long intertrial intervals are the same as Type 4 trials, provided these intertrial intervals occur in the training context.

Conditioned responding can be attenuated by presentations of the cue alone before the cue-outcome pairings, intermingled with the pairings, or after the pairings. If they occur before the pairings, the attenuation is called the CS-preexposure (also called latent inhibition) effect (Lubow & Moore, 1959); if they occur during the pairings, they (in conjunction with the pairings) are called partial reinforcement (Pavlov, 1927); and if they occur after the pairings, the attenuation is called extinction (Pavlov, 1927). Notably, the operations that produce the CS-preexposure effect and habituation (i.e., presentation of a single stimulus) are identical; the difference is in how behavior is subsequently assessed. Additionally, based on the two phenomena being doubly dissociable, Hall (1991) has argued that habituation and the CS-preexposure effect arise from different underlying processes. That is, a change in context between treatment and testing attenuates the CS-preexposure effect more than it does habituation, whereas increasing retention interval attenuates habituation more than it does the CS-preexposure effect.

Conditioned responding can also be attenuated by presentations of the outcome alone before the cue-outcome pairings, with the pairings, or after the pairings. If they occur before the pairings, the attenuation is called the US-preexposure effect (e.g., Randich & LoLordo, 1979); if they occur during the pairings, it (in conjunction with the pairings) is called the degraded contingency effect (in the narrow sense, as any presentation of the cue or outcome alone degrades the objective contingency, Rescorla, 1968); and if they occur after the pairings, it is an instance of retrospective revaluation (e.g., Denniston, Miller, & Matute, 1996). The retrospective revaluation effect has proven far more elusive than any of the other five means of attenuating excitatory conditioned responding through degraded contingency, but it occurs at least under select conditions (Miller & Matute, 1996).

If compounded, these different types of contingencydegrading treatments have a cumulative effect on conditioned responding that is at least summative (Bonardi & Hall, 1996) and possibly greater than summative (Bennett, Wills, Oakeshott, & Mackintosh, 2000). A prime example of such a compound contingency-degrading treatment is so-called learned irrelevance, in which cue and outcome presentations truly random with respect to one another precede a series of cue-outcome pairings (Baker & Mackintosh, 1977). This pretraining treatment has a decremental effect on conditioned responding greater than either CS preexposure or US preexposure.

Objective contingency effects are not merely a function of the frequency of different types of trials depicted in Figure 13.1. Two important factors that influence contingency effects are (a) trial order and spacing, and (b) modulatory stimuli. When contingency-degrading Type 2 and 3 trials are administered phasically (rather than interspersed with cueoutcome pairings), recency effects are pronounced. The trials that occur closest to testing have a relatively greater impact on responding; such recency effects fade with time (i.e., longer retention intervals, or at least as a function of the intervening events that occur during longer retention intervals). Additionally, if there are stimuli that are present during the pairings but not the contingency-degrading treatments (or vice versa), presentation of these stimuli immediately prior to or during testing with the target cue causes conditioned responding to better reflect the trials that occurred in the presence of the stimuli. These modulatory stimuli can be either contextual stimuli (i.e., the static environmental cues present during training: the so-called renewal effect, Bouton & Bolles, 1979) or discrete stimuli (e.g., Brooks & Bouton, 1993). Such modulatory stimuli appear to have much in common with so-called priming cues in cognitive research.

Modulatory effects can be obtained even when the cueoutcome pairings are interspersed with the contingency degrading events. For example, if stimulus A always precedes pairings of cue X and an outcome, and does not precede presentations of cue X alone, subjects will come to respond to the cue if and only if it is preceded by stimulus A; this effect is called positive occasion setting (Holland, 1983a). If stimulus Aonly precedes the nonreinforced presentations of cue X, subjects will come to respond to cue X only when it has not been preceded by stimulus A; this effect is called negative occasion setting. Surprisingly, behavioral modulation by contexts appears to be acquired in far fewer trials than with discrete stimuli, perhaps reflecting the important role of contextual modulation of behavior in each species’ ecological niche.

Attenuation of stimulus control through contingencydegrading events is often at least partially reversible without further cue-outcome pairings. This is most evident in the case of extinction, for which (so-called) spontaneous recovery from extinction and external disinhibition (i.e., temporary release from extinction treatment as a result of presenting an unrelated intense stimulus immediately prior to the extinguished stimulus) are examples of recovery of behavior indicative of the cue-outcome pairings without the occurrence of further pairings (e.g., Pavlov, 1927). Similarly, spontaneous recovery from the CS-preexposure effect has been well documented (e.g., Kraemer, Randall, & Carbary, 1991). These phenomena suggest that the pairings of cue and outcome are encoded independently of the contingency-degrading events, but the behavioral expression of information regarding the pairings can be suppressed by additional learning during the contingencydegrading events.

Cue and Outcome Duration. Cue and outcome durations have great impact on stimulus control of behavior. The effects are complex, but generally speaking, increased cue or outcome duration reduces behavioral control (provided one controls for any greater hedonic value of the outcome due to increased duration). What makes these variables complex is that different components of a stimulus can contribute differentially to stimulus control. The onset, presence, and termination of a cue can each influence behavior through its own relationship to the outcome; this tendency towards fragmentation of behavioral control appears to increase with the length of the duration of the cue (e.g., Romaniuk & Williams, 2000). Similarly, outcomes have components that can differentially contribute to control by a stimulus. As an outcome is prolonged, its later components are further removed in time from the cue and presumably are less well-associated to the cue.

Response Topology and Timing

The hallmark of conditioned responding is that the observed response to the cue reflects the nature of the outcome. For example, pigeons peck an illuminated key differently depending on whether the key signals delivery of food or water, and their manner of pecking is similar to that required to ingest the specific outcome (Jenkins & Moore, 1973). However, the nature of the signal also may qualitatively modulate the conditioned response. For instance, Holland (1977) has described how rats’ conditioned responses to a light and an auditory cue differ, despite their having been paired with the same outcome.

Conditioned responding not only indicates that the cue and outcome have been paired, but also reflects the spatial and temporal relationships that prevailed between the cue and outcome during those pairings (giving rise to the mentalistic view that subjects anticipate, so to speak, when and where the outcome will occur). If a cue has been paired with a rewarding outcome in a particular location, subjects are frequently observed to approach the location at which the outcome had been delivered (so-called goal tracking). For example, Burns and Domjan (1996) observed that Japanese quail, as part of their conditioned response to a cue for a potential mate, oriented to the absolute location in which the mate would be introduced, independent of their immediate location in the experimental apparatus. The temporal relationship between a cue and outcome that existed in training is evidenced in two ways. First, with asymptotic training, the conditioned response ordinarily is emitted just prior to the time at which the outcome would occur based on the prior pairings (Pavlov, 1927). Second, the nature of the response often changes with different cue-outcome intervals. In some instances, when an outcome (e.g., food) occurs at regular intervals, during the intertrial interval subjects emit a sequence of behaviors with a stereotypic temporal structure appropriate for that outcome in the species’ ecological niche (e.g., Staddon & Simmelhag, 1970; Timberlake & Lucas, 1991).

Pavlovian conditioned responding often closely resembles a diminished form of the response to the unconditioned outcome (e.g., conditioned salivation with food as the outcome). Such a response topology is called mimetic. However, conditioned responding is occasionally diametrically opposed to the unconditioned response (e.g., conditioned freezing with pain as the outcome, or a conditioned increase in pain sensitivity with delivery of morphine as the outcome; Siegel, 1989). Such a conditioned response topology is called compensatory. We do not yet have a full understanding of when one or the other type of responding will occur (but see this research paper’s section entitled “What Is a Response?”).

Stimulus Generalization

No perceptual event is ever exactly repeated because of variation in both the environment and in the nervous system. Thus, learning would be useless if organisms did not generalize from stimuli in training to stimuli that are perceptually similar. Therefore, it is not surprising that conditioned responding is seen to decrease in an orderly fashion as the physical difference between the training and test stimuli increases. This reduction in responding is called stimulus generalization decrement. Response magnitude or frequency plotted as a function of training-to-test stimulus similarity yields a symmetric curve that is called a generalization gradient (e.g., Guttman & Kalish, 1956). Such gradients resulting from simple cue-outcome pairings can be made steeper by introducing trials with a second stimulus that is not paired with the outcome. Such discrimination training not only steepens the generalization gradient between the reinforced stimulus and nonreinforced stimulus, but often shifts the stimulus value at which maximum responding is observed from the reinforced cue in the direction away from the value of the nonreinforced stimulus (the so-called peak shift; e.g., Weiss & Schindler, 1981). With increasing retention intervals between the end of training and a test trial, stimulus generalization gradients tend to grow broader (e.g., Riccio, Richardson, & Ebner, 1984)

Phenomena Involving More Than Two Stimuli: Competition, Interference, Facilitation, and Summation

When more than two stimuli are presented in close proximity during training, one might expect that the representation of each stimulus-outcome dyad would be treated independently according to the laws described above. Surely these laws do apply, but the situation becomes more complex because interactions between stimuli also occur. That is, when stimuli X, Y, and Z are trained together, behavioral control by X based on X’s relationship to Y is often influenced by the presence of Z during training. Although these interactions (described in the following sections) are often appreciable, they are neither ubiquitous (i.e., they are more narrowly parameter dependent) nor generally as robust as any of the phenomena described under “Phenomena Involving Two Stimuli.”

Multiple Cues With a Common Outcome

Cues Trained Together and Tested Apart: Competition and Facilitation. For the last 30 years, much attention has beenfocusedoncuecompetitionbetweencuestrainedincompound, particularly overshadowing and blocking. Overshadowing refers to the observed attenuation in conditioned responding to an initially novel cue (X) paired with an outcome in the presence of an initially novel second cue (Y), relative to responding to X given the same treatment in the absence of Y (Pavlov, 1927).The degree thatYwill overshadow X depends on their relative saliences; the more salient Y is compared to X, the greater the degree of overshadowing of X (Mackintosh, 1976). When two cues are equally salient, overshadowing is sometimes observed, but is rarely a large effect. Blocking refers to attenuated responding to a cue (X) that is paired with an outcome in the presence of a second cue (Y) when Y was previously paired with the same outcome in the absence of X, relative to responding to X when Y had not been pretrained (Kamin, 1968). That is, learning as a result of the initial Youtcome association blocks (so to speak) responding to X that the XY-outcome pairings would otherwise support. (Thus, observation of blocking requires good responding to X by the control group, which necessitates the use of parameters that minimize overshadowing of X byYin the control group.)

Both overshadowing and blocking can be observed with a single compound training trial (e.g., Balaz, Kasprow, & Miller,1982;Mackintosh&Reese,1979),areusuallygreatest with a few compound trials, and tend to wane with many compound trials (e.g., Azorlosa & Cicala, 1988). Notably, recovery from each of these cue competition effects can sometimes be obtained without further training trials through various treatments including (a) lengthening the retention interval (i.e., so-called spontaneous recovery; Kraemer, Lariviere, & Spear, 1988); (b) administration of so-called reminder treatments, which consists of presentation of either the outcome alone, the cue alone, or the training context (e.g., Balaz, Gutsin, Cacheiro, & Miller, 1982); and (c) posttraining massiveextinctionoftheovershadowingorblockingstimulus (e.g., Matzel, Schachtman, & Miller, 1985). The theoretical implications of such recovery (paralleling the recovery often observed following the degradation of contingency in the two-stimulus situation) are discussed later in this research paper (see sections entitled “Expression-Focused Models” and “Accounts of Retrospective Revaluation”).

Although competition is far more commonly observed, under certain circumstances the presence of a second cue during training has exactly the opposite effect; that is, it enhances (i.e., facilitates) responding to the target cue. When this effect is observed within the overshadowing procedure, it is called potentiation (Clarke, Westbrook, & Irwin, 1979); and when it is seen in the blocking procedure, it is called augmentation (Batson & Batsell, 2000). Potentiation and augmentation are most readily observed when the outcome is an internal malaise (usually induced by a toxin), the target cue is an odor, and the companion cue is a taste. However, enhancement is not restricted to these modalities (e.g., J. S. Miller, Scherer, & Jagielo, 1995). Another example of enhancement, although possibly with a different underlying mechanism, is superconditioning, which refers to enhanced responding to a cue that is trained in the presence of a cue previously established as a conditioned inhibitor for the outcome, relative to responding to the target cue when the companion cue was novel. In most instances, enhancement appears to be mediated at test by the companion stimulus that was present during training, in that degrading the associative status of the companion stimulus between training and testing often attenuates the enhanced responding (Durlach & Rescorla, 1980).

Cues Trained Apart and Tested Apart. Although theory and research in learning over the past 30 years have focused on the interaction of cues trained together, there is an older literature concerning the interaction of cues with common outcomes trained apart (i.e., X→A, Y→A). This research was conducted largely in the tradition of associationistic studies of human verbal learning that was popular in the mid-twentieth century.Atypical example is the attenuated responding to cue X observed when X→A training is either preceded (proactive interference) or followed (retroactive interference) byY→Atraining, relative to subjects receiving no Y→A training (e.g., Slamecka & Ceraso, 1960). The stimuli used in the original verbal learning studies were usually consonant trigrams, nonsense syllables, or isolated words. However, recent research using nonverbal preparations has found that such interference effects occur quite generally in both humans (Matute & Pineño, 1998) and nonhumans (Escobar, Matute, & Miller, 2001). Importantly,Y→Apresentations degrade the X→A objective contingency because they include presentations of A in the absence of X. This degrading of the X-A contingency sometimes does contribute to the attenuation of responding based on the X→Arelationship (as seen in subjects who receive A-alone as the disruptive treatment relative to subjects who receive no disruptive treatment). However,Y→Atreatment ordinarily produces a larger deficit, suggesting that, in addition to contingency effects, associations with a common element interact to reduce target stimulus control (e.g., Escobar et al., 2001). Although interference is the more frequent result of the X→A, Y→A design, facilitation is sometimes observed, most commonly when X and Y are similar (e.g., Osgood, 1949).

Cues Trained Apart and Tested Together. When two independently trained cuesare compounded at test, responding is usually at least as or more vigorous than when only one of the cues is tested (see Kehoe & Gormezano, 1980). When the response to the compound is greater than to either element, the phenomenon is called response summation. Presumably, a major factor limiting response summation is that compounding two cues creates a test situation different from that of training with either cue; thus, attenuated responding to the compound due to generalization decrement is expected. The question is under what conditions will generalization decrement counteract the summation of the tendencies to respond to the two stimuli. Research suggests that when subjects treat the compound as a unique stimulus in itself, distinct from the original stimuli (i.e., configuring), summation will be minimized (e.g., Kehoe, Horne, Horne, & Macrae, 1994). Well-established rules of perception (e.g., gestalt principles; Köhler, 1947) describe the conditions that favor and oppose configuring.

Multiple Outcomes With a Single Cue

Just as Y→Atrials can interact with behavior based on X→A training, so too can X→B trials interact with behavior based on X→A training.

Multiple Outcomes Trained Together With a Single Cue. When a cue X is paired with a compound of outcomes (i.e., X→AB), responding on tests of the X→A relationship often yield less responding than that of a control group for which B was omitted, provided A and B are sufficiently different. Such a result might be expected based on either distraction during training or response competition at test, both of which are well-established phenomena. However, some studies have been designed to minimize these two potential sources of outcome competition. For example, Burger, Mallemat, and Miller (2000) used a sensory preconditioning procedure (see this research paper’s section entitled “Second-Order Conditioning and Sensory Preconditioning”) in which the competing outcomes were not biologically significant; and only just before testing did they pair A with a biologically significant stimulus so that the subjects’learning could be assessed. As neither Anor B was biologically significant during training, (a) distraction by B from A was less apt to occur (although it cannot be completely discounted), and (b) B controlled no behavior that could have produced response competition at test. Despite minimization of distraction and response competition, Burger et al. still observed competition between outcomes (i.e., the presence of B during training attenuated responding based on X and A having been paired). To our knowledge, no one to date has reported facilitation from the presence of B during training. But analogy with the multiple-cue case suggests that facilitation might occur if the two outcomes had strong within-compound links (i.e., A and B were similar or strongly associated to each other).

Multiple Outcomes Trained Apart With a Single Cue: Counterconditioning. Just as multiple cues trained apart with a common outcome can result in an interaction, so too can an interaction be observed when multiple outcomes are trained apart with a common cue. Alternatively stated, responding based on X→A training can be disrupted by X→B training. The best known example of this is counterconditioning (e.g., responding to a cue based on cue→food training is disrupted by cue→footshock training). The interfering training (X→B) can occur before, among, or after the target training trials (X→A). Although response competition is a likely contributing factor, there is good evidence that such interference effects are due to more than simple response competition (e.g., Dearing & Dickinson, 1979). Just as interference produced by Y→A in the X→A, Y→A situation can be due in part to degrading the X-A contingency, so attenuated responding produced by X→B in the X→A, X→B situation can arise in part from the degrading of the X-A contingency that is inherent in the presentations of X during X→B trials. However, research has found that the response attenuation produced by the X→B trials is sometimes greater than that produced by X-alone presentations; hence, this sort of interference cannot be treated as simply an instance of degraded contingency (Escobar, Arcediano, & Miller, 2001).

Resolving Ambiguity

The magnitude of the interference effects described in the two previous sections is readily controlled by conditions at the time of testing. If the target and interfering treatments have been given in different contexts (i.e., competing elements trained apart), presentation at test of contextual cues associated with the interfering treatment enhances interference, whereas presentation of contextual cues associated with target training reduces interference. These contextual cues can be either diffuse background cues or discrete stimuli that were presented with the target (Escobar et al., 2001). Additionally, more recent training experience typically dominates behavior (i.e., a recency effect), all other factors being equal. Such recency effects fade with increasing retention intervals, with the consequence that retroactive interference fades and, correspondingly, proactive interference increases when the posttraining retention interval is increased (Postman, Stark, & Fraser, 1968).

Notably, the contextual and temporal modulation of interference effects is highly similar to the modulation observed with degraded contingency effects (see this research paper’s section entitled “Factors Influencing Aquired Stimulus Control of Behavior”). This similarity is grounds for revisiting the issue of whether interference effects are really different from degraded contingency effects. We previously cited grounds for rejecting the view that interference effects were no more than degraded contingency effects (see this research paper’s section on that topic). However, if the training context is regarded as an element that can become associated with a cue on a cuealone trial or with an outcome on an outcome-alone trial, contingency degrading trials could be viewed as target cuecontext or context-outcome trials that interfere with behavior promoted by target cue-outcome trials much as Y-outcome or target-B trials do within the interference paradigm. In principle, this allows degraded contingency effects to be viewed as a subset of interference effects. However, due to the vagueness of context as a stimulus, this approach has not received widespread acceptance.

Mediation

Mediated changes in control of behavior by a stimulus refers to situations in which responding to a target cue is at least partially a function of the training history of a second cue that has at one time or another been paired with the target. Depending on the specific situation, mediational interaction between the target and the companion cues can occur either at the time that they are paired during training (e.g., aversively motivated second-order conditioning, see section entitled “Second-Order Conditioning and Sensory Preconditioning”; Holland & Rescorla, 1975) or at test (e.g., sensory preconditioning, see same section; Rizley & Rescorla, 1972). As discussed below, the mediated control transferred to the target can be either consistent with the status of the companion cue (e.g., second-order conditioning) or inverse to the status of the companion cue (e.g., conditioned inhibition, blocking). Testing whether a mediational relationship between two cues exists usually takes the form of presenting the companion cue with or without the outcome in the absence of the target and seeing whether that treatment influences responding to the target. This manipulation of the companion cue can be done before, interspersed among, or after the target training trials. However, sometimes posttargettraining revaluation of the companion does not alter responding to the target, suggesting that the mediational process occurs during training (e.g., aversively motivated secondorder conditioning).

Second-Order Conditioning and Sensory Preconditioning

If cue Y is paired with a biologically significant outcome (A) such that Y comes to control responding, and subsequently cue X is paired with Y (i.e., Y→A, X→Y), responding to X will be observed. This phenomenon is called second-order conditioning (Pavlov, 1927). Cue X can similarly be imbued with behavioral control if the two phases of training above are reversed (i.e., X→Y, followed by Y→A). This latter phenomenon is called sensory preconditioning (Brogden, 1939). Second-order conditioning and sensory preconditioning are important for two reasons. First, these phenomena are simple examples of mediated responding—that is, acquired behavior that depends on associations between stimuli that are not of inherent biological significance. Second, these phenomena pose a serious challenge to the principle of contiguity. For example, consider sensory preconditioning: A light is paired with a tone, then the tone is paired with an aversive event (i.e., electric shock); at test, the light evokes a conditioned fear response. Thus, the light is controlling a response appropriate for the aversive event, despite its never having been paired with that event. This is a direct violation of contiguity in its simplest form. Based on the observation of mediated behavior, the law of contiguity must be either abandoned or modified. Given the enormous success of contiguity in describing the conditions that foster acquired behavior, researchers generally have elected to redefine contiguity as spatiotemporal proximity between the cue or its surrogate and the outcome or its surrogate, thereby incorporating mediation within the principle of contiguity.

Mediation appears to occur when two different types of training share a common element (e.g., X→Y, Y→A). Importantly, the mediating stimulus ordinarily does not simply act as a (weak) substitute for the outcome (as might be expected of a so-called simple surrogate). Rather, the mediating stimulus (i.e., first-order cue) carries with it its own spatiotemporal relationship to the outcome, such that the secondorder cue supports behavior appropriate for a summation of the mediator-outcome spatiotemporal relationship and the second-order cue-mediator spatiotemporal relationship (for spatial summation, see Etienne, Berlie, Georgakopoulos, & Maurer, 1998; for temporal summation, see Matzel, Held et al., 1988). In effect, subjects appear to integrate the two separately experienced relationships to create a spatiotemporal relationship between the second-order cue and the outcome, despite their never having been physically paired.

The mediating process that links two stimuli that were never paired could occur in principle either during training or during testing. To address this issue, researchers have asked what happens to the response potential of a second-order cue X when its first-order cue is extinguished between training and testing. Rizley and Rescorla (1972) reported that such posttraining extinction of Y did not degrade responding to a second-order cue (X), but subsequent research has under some conditions found attenuated responding to X (Cheatle & Rudy, 1978). The basis for this difference is not yet completely clear, but Nairne and Rescorla (1981) have suggested that it depends on the valence of the outcome (i.e., appetitive or aversive).

Conditioned Inhibition

Conditioned inhibition refers to situations in which a subject behaves as if it has learned that a particular stimulus (a so-called inhibitor) signals the omission of an outcome. Conditioned inhibition is ordinarily assessed by a combination of (a) a summation test in which the putative inhibitor is presented in compound with a known conditioned excitor (different from any excitor that was used in training the inhibitor) and seen to reduce responding to the excitor; and (b) a retardation test in which the inhibitor is seen to be slow in coming to serve as a conditioned excitor in terms of required number of pairings with the outcome (Rescorla, 1969). Because the standard tests for conditioned excitation and conditioned inhibition are operationally distinct, stimuli sometimes can pass tests for both excitatory and inhibitory status after identical treatment. The implication is that conditioned inhibition and conditioned excitation are not mutually exclusive (e.g., Matzel, Gladstein, & Miller, 1988), which is contrary to some theoretical formulations (e.g., Rescorla & Wagner, 1972).

There are several different procedures that appear to produce conditioned inhibition (LoLordo & Fairless, 1985). Among them are (a) explicitly unpaired presentations of the cue (inhibitor) and outcome (described in objective contingency on pp. 361–363); (b) Pavlov’s (1927) procedure in which a training excitor (Y) is paired with an outcome, interspersed with trials in which the training excitor and intended inhibitor (X) are presented in nonreinforced compound; and (c)so-calledbackwardpairingsofa cuewithanoutcome(outcome→X; Heth, 1976). What appears similar across these various procedures is that the inhibitor is present at a time that anothercue(discreteorcontextual)signalsthattheoutcomeis apt to occur, but in fact it does not occur. Conditioned inhibition is stimulus-specific in that it generates relatively narrow generalization gradients, similar to conditioned excitation (Spence, 1936).Additionally, it is outcome-specific in that an inhibitor will transfer its response-attenuating influence on behavior between different cues for thesame outcome, but not between cues for different outcomes (Rescorla & Holland, 1977). Hence, conditioned inhibition, like conditioned excitation, is a form of stimulus-specific learning about a relationship between a cue and an outcome. But because it is necessarily mediated (the cue and outcome are never paired), conditioned inhibition is more similar to second-order conditioning than it is to simple (first-order) conditioning. Moreover, just as responding to a second-order conditioned stimulus not only appears as if the subject expects the outcome at a time and place specified conjointly by the spatiotemporal relationships between X and Y and between Y and the outcome (e.g., Matzel, Held et al., 1988), so too does a conditioned inhibitor seemingly signal not only the omission of the outcome but also the time and place of that omission as well (e.g., Denniston, Blaisdell, & Miller, 1998).

One might ask about the behavioral consequences for conditioned inhibition of posttraining extinction of the mediating cue. Similar to corresponding tests with second-order conditioning, the results have been mixed. For example, Rescorla and Holland (1977) found no alteration of behavior indicative of inhibition, whereas others (e.g., Best, Dunn, Batson, Meachum, & Nash, 1985; Hallam, Grahame, Harris, & Miller, 1992) observed a decrease in inhibition. Yin, Grahame, and Miller (1993) suggested that the critical difference between these studies is that massive posttraining extinction of the mediating stimulus is necessary to obtain changes in behavioral control by an inhibitor.

Despite these operational and behavioral similarities of conditioned inhibition and second-order conditioning, there is one most fundamental difference. Responding to a second-order cue is appropriate for the occurrence of the outcome, whereas responding to an inhibitor is appropriate for the omission of the outcome. In sharp contrast to secondorder conditioning (and sensory preconditioning), which are examples of positive mediation (seemingly passing information, so to speak, concerning an outcome from one cue to a second cue), conditioned inhibition is an example of negative mediation (seemingly inverting the expectation of the outcome conveyed by the first-order cue as the information is passed to the second-order cue). Why positive mediation should occur in some situations and negative mediation in other apparently similar situations is not yet fully understood. Rashotte, Marshall, and O’Connell (1981) and Yin, Barnet, and Miller (1994) have suggested that the critical variable may be the number of nonreinforced X-Ytrials. Asecond difference between inhibition and second-order excitation that is likely related to the aforementioned one is that nonreinforced exposure to an excitor produces extinction, whereas nonreinforced exposure to an inhibitor not only does not reduce its inhibitory potential, but also sometimes increases it (DeVito & Fowler, 1987).

Retrospective Revaluation

Mediated changes in stimulus control of behavior can often be achieved by treatment (reinforcement or extinction) of a target cue’s companion stimulus either before, during, or after the pairings of the target and companion stimuli (reinforced or nonreinforced). Recent interest has focused on treatment of the companion stimulus alone after the completion of the compound trials, because in this case the observed effects on responding to the target are particularly problematic to most conventional associative theories of acquired behavior. A change in stimulus control following the termination of training with the target cue is called retrospective revaluation. Importantly, both positive and negative mediation effects have been observed with the retrospective revaluation procedure. Sensory preconditioning is a long-known but frequently ignored example of retrospective revaluation in its simplest form. It is an example of positive retrospective revaluation because the posttarget-training treatment with the companion stimulus produces a change in responding to the target that mimics the change in control by the companion stimulus. Other examples of positive retrospective revaluation include the decrease in responding sometimes seen to a cue trained in compound when its companion cue is extinguished (i.e., mediated extinction; Holland & Forbes, 1982). In contrast, there are also many reports of negative retrospective revaluation, in which the change in control by the target is in direct opposition to the change produced in the companion during retrospective revaluation. Examples of negative retrospective revaluation include recovery from overshadowing as a result of extinction of the overshadowing stimulus (e.g., Matzel et al., 1985), decreases in conditioned inhibition as a result of extinction of the inhibitor’s training excitor (e.g., DeVito & Fowler, 1987), and backward blocking (AX→outcome, followed by A→outcome, e.g., Denniston et al., 1996).

The occurrence of both positive and negative mediation in retrospective revaluation parallels the two opposing effects that are observed when the companion cue is treated before or during the compound stimulus trials. In the section entitled “Multiple Cues With a Common Outcome,” we described not only overshadowing but also potentiation, which, although operationally similar to overshadowing, has a converse behavioral result. Notably, the positive mediation apparent in potentiation can usually be reversed by posttraining extinction of the mediating (potentiating) cue (e.g., Durlach & Rescorla, 1980). Similarly, the negative mediation apparent in overshadowing can sometimes be reversed by massive posttraining extinction of the mediating (overshadowing) cue (e.g., Kaufman & Bolles, 1981; Matzel et al., 1985). However, currently there are insufficient data to specify a rule for the changes in control by a cue that will be observed when its companion cue is reinforced or extinguished. That is to say, we do not know the critical variables that determine whether mediation will be positive or negative. As previously mentioned (see section titled “Conditioned Inhibition”), the two prime candidates for determining the direction of mediation are the number of pairings of the target with the mediating cue and whether those pairings are simultaneous or serial. Whatever the outcome of future studies, research on retrospective revaluation has clearly demonstrated that the previously accepted view— that the response potential of a cue cannot change if it is not presented—was incorrect.

Models of Pavlovian Responding: Theory

Here we turn from our summary of variables that influence acquired behavior based on cue-outcome (Pavlovian) relationships to a review of accounts of this acquired behavior. In this section, we contrast the major variables that differentiate among models, and we refer back to our list of empirical variables (see section titled “Factors Influencing Acquired Stimulus Control of Behavior”) to ask how the different families of models account for the roles of these variables. Citations are provided for the interested reader wishing to pursue the specifics of one or another model.

Units of Analysis

What Is a Stimulus?

Before we review specific theories, we must briefly consider how an organism perceives a stimulus and processes its representation. Different models of acquired behavior use different definitions of stimuli. In some models, the immediate perceptual field is composed of a vast number of microelements (e.g., we learn not about a tree, but each branch, twig, and leaf; Estes & Burke, 1953; McLaren & Mackintosh, 2000). In other models, the perceptual field at any given moment consists of a few integrated sources of receptor stimulation (e.g., the oak tree, the maple tree; Rescorla & Wagner, 1972; Gallistel & Gibbon, 2000). For yet other models, the perceptual field at any given moment is fully integrated and contains only one so-called configured stimulus, which consists of all that immediately impinges on the sensorium (the forest; Pearce, 1987). Although each approach offers its own distinct merits and demerits, they have all proven viable. Generally speaking, the larger the number of elements assumed, the more readily can behavior be explained post hoc, but the more difficult it is to make testable a priori predictions. By increasing the number of stimuli, each of which can have its own associative status, one is necessarily increasing the number of variables and often the number of parameters. Thus, it may be difficult to distinguish between models that are correct in the sense that they faithfully represent some fundamental relationship between acquired behavior and events in the environment, and models that succeed because there is enough flexibility in the model’s parameters to account for virtually any result (i.e., curve fitting). Most models assume that subjects process representations of a small number of integrated stimuli at any one time. That is, the perceptual field might consist of a tone and a light and a tree, each represented as an integrated and inseparable whole.

WorthyofspecialnotehereistheMcLarenandMackintosh (2000) model with its elemental approach. This model not only addresses the fundamental phenomena of acquired behavior, but also accounts for perceptual learning, thereby providing an account of how and by what mechanism organisms weave the stimulation provided by many microelements into the perceptual fabric of lay usage. In other words, the model offers an explanation of how experience causes us to merge representations of branches, twigs, and leaves into a compound construct like a tree.

What Is a Response?

In Pavlovian learning, the conditioned response reflects the nature of the outcome, which is ordinarily a biologically significant unconditioned stimulus (but see Holland, 1977). However, this is not sufficient to predict the form of conditioned behavior. Although responding is often of the same formastheunconditionedresponsetotheunconditionedstimulus (i.e., mimetic), it is sometimes in the opposite direction (i.e., compensatory). Examples of mimetic conditioned responding include eyelid conditioning, conditioned salivation, and conditioned release of endogenous endorphins with aversive stimulation as the unconditioned stimulus. Examples of compensatory conditioned responding include conditioned freezing with foot shock as the unconditioned stimulus, and conditioned opiate withdrawal symptoms with opiates as the unconditioned stimulus. The question of under what conditionswillconditionedrespondingbecompensatoryasopposed to mimetic has yet to be satisfactorily answered. Eikelboom and Stewart (1982) argued that all conditioned responding is mimetic, and that compensatory instances simply reflect our misidentifying the unconditioned stimulus—that is, for unconditioned stimuli that impinge primarily on efferent neural pathways of the peripheral nervous system, the real reinforcer is the feedback to the central nervous system. Thus, what is often called the unconditioned response precedes a later behavior that constitutes the effective unconditioned response. This approach is stimulating, but encounters problems: Most unconditioned stimuli impinge on both afferent and efferent pathways, and there are complex feedback loops at various anatomical levels between these two pathways.

Conditioned responding is not just a reflection of past experience with a cue indicating a change in the probability of an outcome. Acquired behavior reflects not only the likelihood that a reinforcer will occur, but when and where the reinforcer will occur. This is evident in most learning situations (see “Response Topology and Timing”). For example, Clayton and Dickinson (1999) have reported that scrub jays, which cache food, remember not only what food items have been stored, but where and when they were stored. Additionally, there is evidence that subjects can integrate temporal and spatial information from different learning experiences to create spatiotemporal relationships between stimuli that were never paired in actual experience (e.g., Etienne et al., 1998; Savastano & Miller, 1998). Alternatively stated, in mediated learning, not only does the mediating stimulus become a surrogate for the occurrence of the outcome, it carries with it information concerning where and when the outcome will occur, as is evident in the phenomenon of goal tracking (e.g., Burns & Domjan, 1996).

What Mental Links Are Formed?

In the middle of the twentieth century, there was considerable controversy about whether cue-outcome, cue-response, or response-outcome relationships were learned (i.e., associations, links).The major strategies used to resolve this question were to either (a) use test conditions that differed from those of training by pitting one type of association against another (e.g., go towards a specific stimulus, or turn right); or (b) degrade one or another component after training (e.g., satiation or habituation of the outcome or extinction of the eliciting cue) and observe its effect on acquired behavior. The results of such studies indicated that subjects could readily learn all three types of associations, and ordinarily did to various degrees, depending on which allowed the easiest solution of the task facing the subject (reviewed by Kimble, 1961). That is, subjects are versatile in their information processing strategies, opportunistic, and ordinarily adept at using whichever combination of environmental relationships is most adaptive.

Although much stimulus control of behavior can be described in terms of simple associations among cues, responses, and outcomes, occasion setting (described under the section entitled “Objective Contingency”) does not yield to such analyses. One view of how occasion setting works is that occasion setters serve to facilitate (or inhibit) the retrieval of associations (e.g., Holland, 1983b). Thus, they involve hierarchical associations; that is, they are associated with associations rather than with simple representations of stimuli or responses (cf. section entitled “Hierarchical Associations”). Such a view introduces a new type of learning, thereby adding complexity to the compendium of possible learned relationships. The leading alternative to this view of occasion setting is that occasion setters join into configural units with the stimuli that they are modulating (Schmajuk, Lamoureux, & Holland, 1998). This latter approach suffices to explain behavior in most occasion-setting situations, but to date has led to few novel testable predictions. Both approaches appear strained when used to account for transfer of modulation of an occasion setter from the association with which they were trained to another association. Such transfer is successful only if the transfer association itself was previously occasion set (Holland, 1989).

Acquisition-Focused (Associative) Models

All traditional models of acquired behavior have assumed that critical processing of information occurs exclusively when target stimuli occur—that is, at training, at test, or at both. The various contemporary models of acquired behavior can be divided into those that emphasize processing that occurs during training (hereafter called acquisition-focused models) and those that emphasize processing that occurs during testing (hereafter called expression-focused models). For each of these two families of models in their simplest forms, there are phenomena that are readily explained and other phenomena that are problematic. However, theorists have managed to explain most observed phenomena within acquired behavior in either framework (see R. R. Miller & Escobar, 2001) when allowed to modify models after new observations are reported (see section entitled “Where Have the Models Taken Us?”).

The dominant tradition since Thorndike (1932) has been the acquisition-focused approach, which assumes that learning consists of the development of associations. In theoretical terms, each association is characterized by an associative strength or value, which is a kind of summary statistic representing the cumulative history of the subject with the associated events. Hull (1943) and Rescorla and Wagner (1972) providetwoexamplesofacquisition-focusedmodels,withthe latter being the most influential model today (see R. R. Miller, Barnet, & Grahame, 1995, for a critical review of this model). Contemporary associative models today are perhaps best represented by that of Rescorla and Wagner, who proposed that time was divided into (training) trials and on each trial for which a cue of interest was present, there was a change in that cue’s association to the outcome equal to the product of the saliences of the cue and outcome, times the difference between the outcome experienced and the expectation of the outcome based on all cues present on that trial. Notably, in acquisition-focused models, subjects are assumed not to recall specific experiences (i.e., training trials) at test; rather they have accessible only the current associative strength between events. Models within this family differ primarily in the rules used to calculate associative strength, and whether other summary statistics are also computed. For example, Pearce and Hall (1980) proposed that on each training trial, subjects not only update the associative strength between stimuli present on that trial, but also recalculate the so-called associability of each stimulus present on that trial. What all contemporary acquisition-focused models share is that new experience causes an updating of associative strength; hence, recent experience is expected to have a greater impact on behavior than otherwise equivalent earlier experience. The result is that these models are quite adept at accounting for those trial-order effects that can be viewed as recency effects; conversely, they are challenged by primacy effects (which, generally speaking, are far less frequent than recency effects). In the following section, we discuss some of the major variables that differentiate among the various acquisition-focused models. Specifics of individual models are not described here, but relevant citations are provided.

Addressing Critical Factors of Acquired Behavior

Stimulus Salience and Attention

Nearly all models (acquisition- and expression-focused) represent the saliencies of the cue and outcome through one conjoint (e.g., Bush & Mosteller, 1951) or two independent parameters (one for the cue and the other for the outcome, e.g., Rescorla & Wagner, 1972). A significant departure from this standard treatment of salience-attention is Pearce and Hall’s (1980) model, which sharply differentiates between salience, which is a constant for each cue, and associability, which changes with experience and affects the rate (per trial) at which new information about the cue is encoded.

Predispositions: Genetic and Experiential. Behavioral predispositions, which depend on evolutionary history, specific prior experience, or both, are very difficult to capture in models meant to have broad generality across individuals within a species and across species. In fact, most models of acquired behavior (acquisition- and expression-focused) have ignored the issue of predispositions. However, those models that use a single parameter to describe the conjoint associability (growth parameter) for both the cue and outcome (as opposed to separate associabilities for the cue and outcome) can readily incorporate predispositions within this parameter. For example, in the well-known Garcia and Koelling (1966) demonstration of flavors joining into association with gastric distress more readily than with electric shock and audiovisual cues entering into association more readily with electric shock than with gastric distress, separate (constant) associabilities for the flavor, audiovisual cue, electric shock, and gastric distress cannot account for the observed predispositions. In contrast, this example of cue-toconsequence effects is readily accounted for by high conjoint associabilities for flavor–gastric distress and for audiovisual cues–electric shock, and low conjoint associabilities for flavor–electric shock and for audiovisual cues–gastric distress. However, to require a separate associability parameter for every possible cue-outcome dyad creates a vastly greater number of parameters than simply having a single parameter for each cue and each outcome with changes in behavior being in part a function of these two parameters (usually their product). Hence, we see here the recurring trade-off between oversimplifying (separate parameters for each cue and each outcome) and reality (a unique parameter for each cue-outcome dyad).

An alternative to models aiming for broad generality over tasks and species is to develop separate models for each task (e.g., foraging, mating, defense, shelter from the elements) and species, consistent with the view that the mind is modular (e.g., Garcia, Lasiter, Bermudez-Rattoni, & Deems, 1985). This approach has been championed by some researchers (Cosmides & Tooby, 1994), but faces challenges because the resulting models can become very complex and are limited in their potential to generate unambiguous testable predictions.

Spatiotemporal Contiguity (Similarity). Despite the empirical importance of contiguity as a determinant of acquired behavior, it is surprising that many associative models give short shrift to this critical variable. One common tactic has been to incorporate contiguity indirectly through changes in the predictive status of the context that on subsequent trials modulates the associative status of the cue (e.g., Mackintosh, 1975; Pearce & Hall, 1980; Rescorla & Wagner, 1972). The associative models that do squarely address the effects of temporal contiguity are real-time models (see Temporal Window of Analysis on p. 374; e.g., McLaren & Mackintosh, 2000; Sutton & Barto, 1981; Wagner; 1981).

Objective Contingency. The attenuation of acquired behavior through degradation of contingency has rarely been addressed as a unified problem. Most associative models of acquired behavior have accounted for extinction through either (a) weakening of the cue-outcome association (e.g., Rescorla & Wagner, 1972), or (b) the development of an inhibitory relationship between the cue and outcome that opposes the expression of the initial excitatory association (e.g., Hull, 1952; Pearce & Hall, 1980; Wagner, 1981). Attenuated responding due to partial reinforcement (i.e., nonreinforced cues interspersed among the cue-outcome pairings) is ordinarily explained through mechanisms similar to those used to account for extinction. The CS-preexposure effect has been explained both in terms of (a) a decrease in the associability (attention) to the cue as a result of nonreinforced pretraining exposure (e.g., Pearce & Hall, 1980); and (b) the development of a strong context-cue association that attenuates acquisition of the cue-outcome association (e.g., Wagner, 1981). The context specificity of the CS-preexposure effect seemingly lends support to this latter view, but at least one attentional approach can also accommodate it (Lubow, 1989). Notably, some prominent models simply fail to account for the CSpreexposure effect (e.g., Rescorla & Wagner, 1972).

Attenuated responding achieved by degrading contingencythroughunsignaledUSsinterspersedamongtheCS-US pairings and the US-preexposure effect are both explained by most associative models in terms of context-outcome associations, which then compete with the cue-outcome association. This is consistent with the context specificity of these effects (i.e., CS preexposure in one context retards subsequent stimulus control during cue-outcome pairings much less if the preexposure occurred outside of the training context). However, habituation to the outcome can also contribute to the effect in certain cases (Randich & LoLordo, 1979). Only a few associative models can account for reduced responding as a result of unsignaled outcome exposures after the termination of cue training (Dickinson & Burke, 1996; Van Hamme & Wasserman, 1994). However, confirmation of this prediction is only a limited success because the effect is difficult to obtain experimentally (see Denniston et al., 1996).

Cue and Outcome Durations. Models that parse time into trials usually account for the generally weaker stimulus control observed when cue duration is increased by changing the cue’s associability-salience parameter (e.g., Rescorla & Wagner, 1972). This mechanism is largely post hoc. Changes in outcome duration might be addressed in the same manner, but they have received little attention because results of studies that have varied outcome duration are mixed, presumably because the motivational properties of the outcome changed with the duration of its presentation. A far better account of cue and outcome durations is provided by real-time associative models (McLaren & Mackintosh, 2000; Sutton & Barto, 1981; Wagner, 1981).According to these models, the associative strength of a cue changes continuously when it is present, depending on the activity of the outcome representation.

Reinforcement Theory. For the first 60 years of the twentieth century, various forms of reinforcement theory dominated the study of acquired behavior. The history of reinforcement theory can be traced from Thorndike’s strong law of effect (1911; see section entitled “Instrumental Responding”) through Hull’s several models (e.g., 1952). The basic premise of reinforcement theory was that learning did not occur without a biologically significant reinforcer. Although this view was long dominant, as early as Tolman (1932) there were objections, often framed in terms of reinforcement’s having more impact on the expression of knowledge than on the encoding of it. Although reinforcement during training may well accelerate the rate at which a cue-outcome relationship is learned, encoding of stimulus relationships does occur in the absence of reinforcement (unless one insists on making esoteric arguments that every stimulus about which organisms can learn has some minimal reinforcing value). This is readily demonstrated in Pavlovian situations by the sensory preconditioning effect (X→A training followed by A→US training SPC, with a subsequent test on X; Brogden, 1939) and in instrumental situations by latent learning effects in which the subject is not motivated when exposed to the learning relationships (Tolman & Honzik, 1930).

Conditioned Inhibition. The operations and consequentchangesinbehaviorindicativeofconditionedinhibition were described previously in this research paper. At the theoretical level, there are three different ways that acquisition-focused models have accounted for conditioned inhibition. Konorski (1948) suggested that inhibitory cues elevate the activation threshold of the US representation required for generation of a conditioned response. Later, Konorski (1967) proposed that inhibitory cues activated a no-US representation that countered activation of a US representation by excitatory associations to that stimulus or other stimuli present at test. Subsequently, Rescorla and Wagner (1972) proposed that conditioned inhibitors were cues with negative associative strength.According to this view, for a specific stimulus conditioned inhibition and excitation are mutually exclusive. This position has been widely adopted, perhaps in part because of its simplicity. However, considerable data (e.g., Matzel, Gladstein, et al., 1988) demonstrate that inhibition and excitation are not mutually exclusive (i.e., a given stimulus can pass tests for both excitation and inhibition without intervening training). Most acquisition-focused theories other than the Rescorla-Wagner model allow stimuli to possess both excitatory and inhibitory potential simultaneously (e.g., Pearce & Hall, 1980; Wagner, 1981).

Response Rules. Any model of acquired behavior must include both learning rules (to encode experience) and response rules (to express this encoded information). Acquisition-focused models, by their nature, generally have simple response rules and leave accounts of behavioral phenomena largely to differences in what is learned during training. For example, the Rescorla-Wagner (1972) model simply states that responding will be a monotonic function of associative strength. In practice, most researchers who have tried to test the model quantitatively have assumed that response magnitude is proportional to associative strength. The omission of a specific response rule in the Rescorla-Wagner model was not an oversight. They wanted to focus attention on acquisition processes and did not want researchers to be distracted by concerns that were not central to their model. However, the lack of a specific response rule leaves the Rescorla-Wagner model less of a quantitative model than is sometimes acknowledged.

Information Value. The view that cues acquire associative strength to the extent that they are informative about (i.e., predict) an outcome was first suggested by Egger and Miller (1963), who observed less responding to X after A→X→US trials than after equivalent training in the absence of A (X→US; i.e., serial overshadowing). Kamin (1968) developed the position, and it was later formalized in the RescorlaWagner (1972) model. Rescorla and Wagner’s primary concern was competition between cues trained in compound (e.g., overshadowing and blocking). They argued that a cue would acquire associative strength with respect to an outcome to the extent that the outcome was not already predicted (i.e., was surprising). If another cue that was present during training of the target already predicted the outcome, there was no new information about the outcome to be provided by the cue, and hence no learning occurred. This position held sway for several decades, became central to many subsequent models of learning (e.g., Mackintosh, 1975; Pearce, 1987; Pearce & Hall, 1980; Wagner, 1981), and is still popular today. The informational hypothesis has been invoked to account for many observations, including the weak responding observed to cues presented simultaneously with an outcome (i.e., the simultaneous conditioning deficit). But it has been criticized for failing to distinguish between learning and expression of what was learned. Demonstrations of recovery (without further training) from competition between cues trained in compound challenge the informational hypothesis (e.g., reminder cues; Kasprow, Cacheiro, Balaz, & Miller, 1982; extinction of the competing cue; Kaufman & Bolles, 1981; and spontaneous recovery; J. S. Miller, McKinzie, Kraebel, & Spear, 1996). Similarly problematic is the observation that simultaneous presentations of a cue (X) and outcome appear to result in latent learning that can later be revealed by manipulations that create a forward relationship to a stimulus presented at test (e.g., X and US simultaneous, Y→X, test on Y; Matzel, Held et al., 1988). Thus, both cue competition and the simultaneous conditioning deficit appear to be, at least in part, deficits in expression of acquired knowledge rather than deficits in acquisition, contrary to the informational hypothesis. Certainly, predictive power (the focus of the informational hypothesis) is the primary function of learning, but the process underlying learning appears to be dissociated from this important function.

Element Emphasized

Contemporary associative models of acquired behavior were designed in large part to account for cue competition between cues trained in compound. Although there is considerable reason to think that cue competition is due to factors other than deficient acquisition (see “Multiple Cues With a Common Outcome”), most contemporary associative models have attempted to account for cue competition through either the outcome’s or the cue’s becoming less effective in supporting new learning. Outcome-limited associative models are ordinarily based on the informational hypothesis, and assume that the outcome becomes less effective in promoting new learning because it is already predicted by the competing cues that are presented concurrently with the target (e.g., Rescorla & Wagner, 1972). In contrast, cue-limited models assume that attention to (or associability of) the target cue decreases as a result of the concurrent presence of competing cues that are better predictors of the outcome than is the target (e.g., Pearce & Hall, 1980).

As both outcome- and cue-limited models have their advantages, some theorists have created hybrid models that employ both mechanisms (e.g., Mackintosh, 1975; Wagner, 1981). Obviously, such hybrid models tend to be more successful in providing post hoc accounts of phenomena. But because they incorporate multiple mechanisms, their a priori predictions tend to be dependent on specific parameters. Thus, in some cases their predictions can be ambiguous unless extensive preliminary work is done to determine the appropriate parameters for the specific situation.

Temporal Window of Analysis

A central feature of any model of acquired behavior is the frequency with which new perceptual input is integrated with previously acquired knowledge. Most acquisition-focused models of learning are discrete-trial models, which assume that acquired behavior on any trial depends on pretrial knowledge, and that the information provided on the trial is integrated with this knowledge immediately after the trial (i.e., after the occurrence or nonoccurrence of the outcome; e.g., Mackintosh, 1975; Pearce & Hall, 1980; Rescorla & Wagner, 1972). Such an assumption contrasts with real-time models, which assume that new information is integrated continuously with prior knowledge (e.g., McLaren & Mackintosh, 2000; Sutton & Barto, 1981; Wagner, 1981). In practice, most implementations of real-time models do not integrate information instantaneously, but rather do so very frequently (e.g., every 0.1 s) throughout each training session.A common weakness of all discrete-trial models (expression- as wellasacquisition-focused)isthattheycannotaccountforthe powerful effects of cue-outcome temporal contiguity. Parsing an experimental session into trials in which cues and outcomes do or do not occur necessarily implies that temporal information is lost. In contrast, real-time models (expression- as well as acquisition-focused) can readily account for temporal contiguity effects. Real-time models are clearly more realistic, but discrete-trial models are more tractable, hence less ambiguous, and consequently stimulate more research.

Expression-Focused Models

In contrast to acquisition-focused models, in which summary statistics representing prior experience are assumed to be all that is retained, expression-focused models assume that a more or less veridical representation of past experience is retained, and that on each test trial subjects process all (or a sample) of this large store of information to determine their immediate behavior (R. R. Miller & Escobar, 2001). Hence, these models can be viewed more as response rules rather than rules for learning per se. This approach makes far greater demands upon memory, but perhaps there is little empirical reason to believe that limits on long-term memory capacity constrain how behavior is modified as a function of experience. In many respects, this difference between acquisition- and expression-focused models is analogous (perhaps homologous) to the distinction between prototype and exemplar models in category learning (Ross & Makin, 1999). A consistent characteristic of contemporary expressionfocused models of acquired behavior is that they all involve some sort of comparison between the likelihood of the outcome in the presence of the cue and the likelihood of the outcome in the absence of the cue.

Contingency Models

One of the earliest and best known contingency models is that of Rescorla (1968; also see Kelley, 1967). This discrete-trial model posits that subjects behave as if they record the frequencies of (a) cue-outcome pairings, (b) cues alone, (c) outcomes alone, and (d) trials with neither (see Figure 13.1). Based on these frequencies, conditioned responding reflects the difference between the conditional probability of the outcome given the presence of the cue, and the conditional probability of the outcome in the absence of the cue (i.e., the base rate of the outcome). Alternatively stated, stimulus control is assumed to be directly related to the change in outcome probability signaled by the cue. A conditioned excitor is a cue that signals an increase in the probability of the outcome, whereas a conditioned inhibitor is a cue that signals a decrease in that probability. This model is often quite successful in describing conditioned responding (and causal inference, which appears to follow much the same rules as Pavlovian conditioning; see Shanks, 1994, for a review). However, researchers have found that differentially weighting the four types of trial frequencies (with Type 1 receiving the greatest weight and Type 4 the least), provides an improved description of the data (e.g., Wasserman et al., 1993).

Rescorla’s contingency (1968) model is elegant in its simplicity (e.g., contingency effects are explained as increases in trial types 2 and 3), but suffers from several problems. Unlike most associative models, it cannot account for (a) the powerful effects of trial order (e.g., recency effects) because it ignores the order in which trials occur; or (b) cue competition effects (e.g., blocking) because it addresses only single cue situations. For these reasons, Rescorla abandoned his contingency model in favor of the Rescorla-Wagner (1972) model. However, other researchers have addressed these deficits by proposing variants of Rescorla’s contingency model. For example, Cheng and Novick (1992) developed a contingency model that, rather than incorporating all trials, includes selection rules for which trials contribute to the frequencies used to compute the conditional probabilities. Their focal set model succeeds in accounting for cue competition. Additionally, if trials are differentially weighted as a function of recency, contingency models are able to address trial-order effects (e.g., Maldonado, Cátena, Cándido, & García, 1999). Finally, although simple contingency models cannot explain cue-outcome contiguity effects, this problem is shared with most models (acquisition- as well as expression-focused) that decompose experience into discrete trials.

Comparator Models

Comparator models are similar to contingency models in emphasizing a comparison at the time of testing between the likelihood of the outcome in the presence and absence of the cue. However, these models are not based on computation of event frequencies. Currently, there are two types of comparator models. One focuses exclusively on comparisons of temporal relationships (e.g., rates of outcome occurrence), whereas the other assumes that comparisons occur on many dimensions, with time as only one of them.

The best-known timing model of acquired behavior is Gibbon and Balsam’s (1981; also see Balsam, 1984) scalarexpectancy theory (SET). According to SET, conditioned responding is directly related to the average interval between outcomes during training (i.e., an inverse measure of the prediction of the outcome based on the context), and inversely related to the interval between cue onset and the outcome (i.e., a measure of the prediction of the outcome based on the cue). Like all timing models (in contrast to the other expression-focused models), SET is highly successful in explaining cue-outcome contiguity effects and also does well in predicting the effects of contingency degradation that occur when the outcome is presented in the absence of the cue. Although the model accounts for the CS-preexposure effect if context exposure is held constant, it fails to explain extinction, because latencies to the outcome are assumed to be updated only when an outcome occurs. Scalar-expectancy theory also fails to account for stimulus competition-interference effects.

A recent expression-focused timing model proposed by Gallistel and Gibbon (2000), called rate-expectancy theory (RET), incorporates many of the principles of SET, but emphasizes rates of outcome occurrence (in the presence and absence of the cue), rather than latencies between outcomes. This inversion from waiting times (i.e., latencies) to rates allows the model to account for stimulus competitioninterference effects because rates of reinforcement associated with different cues are assumed to summate; in contrast to SET, RET considers outcome rates attributed to nontarget discrete cues as well as background cues. Moreover, reinforcement rates are assumed to change continuously with exposure to the cue or to the background stimuli in the absence of as well as with the occurrence of the outcome, thereby accounting for extinction as well as the CS-preexposure effect and partial reinforcement.

A comparator model that does not focus exclusively on timing is the comparator hypothesis of R. R. Miller and Matzel (1988; also see Denniston, Savastano, & Miller, 2001). In this model, responding is also assumed to be directly related to the degree to which the target cue predicts the outcome and inversely related to the degree to which background (discrete and contextual) cues present during training of the cue predict the outcome. The down-modulating effect of the background cues on acquired responding depends on the similarity of the outcome (in all aspects, including temporal and spatial attributes) that these cues predict relative to the outcome that the target cue predicts. Thus, this model (along with contingency theory) brings to acquired responding the principle of relativity that is seen in many other subfields concerned with information processing by organisms (e.g., Fechner’s law, the marginal value theorem of economics, contrast effects in motivational theory, the matching law of behavioral choice as discussed in this research paper’s section entitled “Instrumental Responding”). The timing expressionfocused models also emphasize relativity (so-called timescale invariance), but only in the temporal domain. The comparator hypothesis accounts for both contingency degradation and cue competition effects through links between the cue and background stimuli (discrete and contextual) and links between these background stimuli and the outcome.

Conditioned Inhibition. In all of the comparator models, a conditioned inhibitor is viewed as a stimulus that signals a reduction in the rate or probability of reinforcement relative to the baseline occurrence of the reinforcer during training in the absence of the cue. This position avoids the theoretical quandary faced by the associative views of conditioned inhibition concerning the representation of (a) nooutcome, or (b) a below-zero expectation of the outcome.

Acquisition Rules. As previously stated (AcquisitionFocused (Associative) Models), models of acquired behavior must include both acquisition rules and response rules. In contrast to acquisition-focused models, which generally have simple response rules and leave accounts of behavioral differences largely to differences in what is encoded during training, expression-focused models have simple rules for acquisition and rely on response rules for an account of most behavioral differences. Thus, the attenuated responding to a target cue observed, for example, in a blocking or contingency-degrading treatment is assumed to arise not from a failure to encode the target cue-outcome pairings, but rather from a failure to express this information in behavior.

Accounts of Retrospective Revaluation

In the section entitled “Retrospective Revaluation,” we described retroactive revaluation of response potential, in which, after training with a target cue in the presence of other stimuli (discrete or contextual), treatment of the companion stimuli (i.e., presentation of a companion stimulus with or without the outcome) can alter responding to the target cue. Examples include such mediational phenomena as sensory preconditioning—in which the mediating stimulus is paired with the outcome; see section entitled “Second-Order Conditioning and Sensory Preconditioning”—and recovery from overshadowing as a result of extinguishing the overshadowing cue (e.g., Dickinson & Charnock, 1985; Kaufman & Bolles, 1981; Matzel et al., 1985).

Expression-focused models that accommodate multiple cues (e.g., the comparator hypothesis and RET) generally have no difficulty accounting for retrospective revaluation because new experience with a companion stimulus changes its predictive value, and responding to the cue is usually assumed to be inversely related to the response potential of companion stimuli. Thus, a retrospective change in a cue’s response potential does not represent new learning about the absent cue, but rather new learning concerning the companion stimuli.

In contrast, empirical retrospective revaluation is problematic to most traditional acquisition-focused models. This is because these models assume that responding reflects the associative status of the target cue, which is generally assumed not to change during retrospective revaluation trials (on which the cue is absent). But given growing evidence of empirical retrospective revaluation, several researchers have proposed models that allow changes in the associative status of a cue when it is absent. One of the first of these was a revision of the Rescorla-Wagner (1972) model by Van Hamme and Wasserman (1994), which allows changes in the associative strength of an absent target cue, provided that some associate of the target cue was present. This simple modification successfully accounts for most instances of retrospective revaluation, but otherwise has the same failings and successes as the Rescorla-Wagner model (see R. R. Miller et al., 1995). An alternative associative approach to retrospective revaluation is provided by Dickinson and Burke (1996), who modified Wagner’s (1981) SOP model to allow new learning about absent stimuli. As might be expected, the Dickinson and Burke model has many of the same successes and problems as Wagner’s model (see section entitled “Where Have the Models Taken Us?”). A notable problem for these associative accounts of retrospective revaluation is that other researchers have attempted to explain mediated learning (e.g., sensory-preconditioning and mediated extinction) with similar models, except that absent cues have an associability of opposite sign than that assumed by Van Hamme and Wasserman and by Dickinson and Burke (Hall, 1996;

Holland, 1981, 1983b). Without a principled rule for deciding when mediation will be positive (e.g., second-order conditioning) as opposed to negative (e.g., recovery from overshadowing achieved through extinction of the overshadowing cue), there seems to be an arbitrariness to this approach. In contrast, the expression-focused models unambiguously predict negative mediation (and fail to account for positive mediation when it is observed). That is, a change in the response potential of a companion stimulus is always expected to be inversely related to the resulting change in the response potential of the target cue.

Where Have the Models Taken Us?

As previously noted (in our discussion of acquisition-focused models), theorists have been able to develop models of acquired behavior that can potentially account for many observations after the fact. Any specific model can, in principle, be refuted, but classes of models, such as the families of acquisition-focused or expression-focused models, allow nearly unlimited possibilities for future models within that family (R. R. Miller & Escobar, 2001). If the goal is to determine precisely how the mind processes information at the psychological level, contemporary theories of learning have not been successful because viable post hoc alternatives are often possible and in retrospect may appear as plausible as the a priori model that inspired the research.

Nevertheless, models have succeeded in stimulating experiments that identify new empirical relationships. The models most successful in this respect are often among the least successful in actually accounting for behavioral change. This is because a model stimulates research only to the extent that it makes unambiguous predictions. Models with many parameters and variables (e.g., McLaren & Mackintosh, 2000; Wagner, 1981) can be tuned post hoc to account for almost any observation; hence, few attempts are made to test such models, however plausible they might appear. In contrast, oversimplified models such as Rescorla and Wagner (1972) make unambiguous predictions that can be tested, with the result that the model is often refuted. For the foreseeable future, a dialectical path towards theory development, in which relatively simple models are used to generate predictions which, when refuted, lead to the development of relatively complex models that are more difficult to test, is likely to persist.

Instrumental Responding

This research paper has so far focused almost exclusively on Pavlovian (i.e., stimulus-outcome) conditioning. By definition, in a Pavlovian situation the contingency between a subject’s responding and an outcome is zero, but in many situations outcomes are in fact dependent upon specific responses. That is, behavior is sensitive to the contingency between a response and an outcome. It is obvious that such sensitivity is often adaptive. For example, a rat will quickly learn to press a lever for food pellets; conversely, a child who touches a hot stove will rarely do so again. A situation in which an organism’s behavior changes after exposure to a response-outcome contingency is termed instrumental conditioning. After reviewing Thorndike’s early work on the law of effect and some basic definitions, this section considers research on instrumental conditioning from three different perspectives: associationistic, functional, and ecologicaleconomic.

Law of Effect: What Is Learned?

Although the idea that rewards and punishments control behavior dates back to antiquity, the modern scientific study of instrumental conditioning was begun by Thorndike (1898). He placed hungry cats in so-called puzzle boxes in which the animal had to perform a response (e.g., pulling a loop of cord) in order to open a door and gain access to food. Over repeated trials, he found that the time necessary to escape gradually decreased. To explain this result, Thorndike (1911) proposed the law of effect, which states that stimulus-response (S-R) connections are strengthened by a “satisfying consequence” that follows the response. Thus, the pairing of the cats’escape response with food increased the likelihood that the cats would subsequently perform the response. Aversive consequences have symmetric but opposite effects; S-R connections would be weakened if an “annoying consequence” (e.g., shock) followed a response. The law of effect represents the most important empirical generalization of instrumental conditioning, but its theoretical significance remains in dispute. The three perspectives considered in this section (associationistic, functional, and ecological-economic) provide different interpretations of the law of effect.

The Three-Term Contingency

Unlike the contingencies used in Pavlovian conditioning, which depend on two stimuli (the cue and outcome) scheduled independently of the subjects’ behavior, the contingencies considered here depend on the occurrence of a response. Such contingencies are called instrumental (i.e., the subjects’ behavior is instrumental in producing the outcome) or operant (i.e., the subjects’ behavior operates on the environment). Because different stimuli can be used to signal particular contingencies (i.e., illumination of a light above a lever signals that a rat’s pressing the lever will result in the delivery of food), the three-term contingency has been proposed as the fundamental unit of instrumental behavior: In the presence of a particular stimulus (discriminative stimulus), a response produces an outcome (reinforcer; Skinner, 1969).

In an instrumental situation, the environmentally imposed reinforcement contingency defines a response and, not surprisingly, the frequency of that response ordinarily changes in a functional manner. Instrumental behavior can sometimes be dysfunctional (i.e., a different response is observed than that defined by the functional contingency), but this is the exception rather than the rule. When dysfunctional acquired behavior is observed, it usually reflects a prevailing contingency that is unusual to the subject’s ecological niche or contrary to its prior experience. Two good examples of dysfunctional responding are vicious circle behavior (Gwinn, 1949) and negative automaintenance (D. R. Williams & Williams, 1969). In the former case, a previously learned response obstructs the subject from coming in contact with a newly introduced contingency, and in the latter case the reinforcement contingency (reward omission) imposed by the experiment is diametrically opposed by a species-specific predisposition that is highly functional in the natural habitat. Such dysfunctional behaviors may provide models of select instances of human psychopathology.

Instrumental Contingencies and Schedules of Reinforcement

There are four basic types of instrumental contingencies, depending on whether the response either produces or eliminates the outcome and whether the outcome is of positive or negative hedonic value. Positive reinforcement (i.e., reward) is a contingency in which responding produces an outcome with the result that there is an increase in response frequency—for example, when a rat’s lever press results in food presentation, or a student’s studying before an exam produces an A grade. Punishment is a contingency in which responding results in the occurrence of an aversive outcome with the result that there is a decrease in response frequency—for example, when a child is scolded for reaching into the cookie jar or a rat’s lever press produces foot shock. Omission (or positive punishment) describes a situation in which responding cancels or prevents the occurrence of a positive outcome with the result that there is a decrease in response frequency. Finally, escape or avoidance conditioning (also called negative reinforcement) is a contingency in which responding leads to the termination of an ongoing or prevention of an expected aversive stimulus with the result that there is an increase in response frequency—for example, if a rat’s lever presses cancel a scheduled shock. Both positive and negative reinforcement contingencies by definition result in increased responding, whereas omission and punishment-avoidance contingencies by definition lead to decreased responding. For various reasons, including obvious ethical concerns, it is desirable whenever possible to use alternatives to punishment for behavior modification. For this reason and practical considerations, there has been an increasing emphasis in the basic and applied research literature on positive reinforcement; research on punishment and aversive conditioning is not discussed here (for reviews, see Ayres, 1998; Dinsmoor, 1998).

A reinforcement schedule is a rule for determining whether a particular response by a subject will be reinforced (Ferster & Skinner, 1957). There are two criteria that have been widely studied: the number of responses emitted since the last reinforced response (ratio schedules), and the time since the last reinforced response (interval schedules). Use of these criteria provide for four basic schedules of reinforcement, which depend on whether the contingency is fixed or variable: fixed interval (FI), fixed ratio (FR), variable interval (VI), and variable ratio (VR). Under an FI x schedule, the first response after x seconds have elapsed since the last reinforcement is reinforced. After reinforcement there is typically a pause in responding, which then begins, increasing slowly, and about two-thirds of the way through the interval increases to a high rate (Schneider, 1969). The temporal control evidenced by FI performance has led to extensive use of these schedules in research on timing (e.g., the peak procedure; Roberts, 1981). With an FR x schedule, the xth response is reinforced. After a postreinforcement pause, responding begins and generally continues at a high rate until reinforcement. When x is large enough, responding may cease entirely with FR schedules (ratio strain; Ferster & Skinner, 1957). Under a VI x schedule, the first response after y seconds have elapsed is reinforced, where y is a value sampled from a distribution that has an average of x seconds. Typically, VI schedules generate steady, moderate rates of responding (Catania & Reynolds, 1968). When a VR x schedule is arranged, the yth response is reinforced, where y is a value sampled from a distribution with an arithmetic mean of x. Variable ratio schedules maintain the highest overall rates of responding of these four common schedule types, even when rates of reinforcement are equated (e.g., Baum, 1993).

Reinforcement schedules have been a major focus of research in instrumental conditioning (for review, see Zeiler, 1984). Representative questions include why VR schedules maintain higher response rates than comparable VI schedules (the answer seems to be that short interresponse times are reinforced under VR schedules; Cole, 1999), and whether schedule effects are best understood in terms of momentary changes in reinforcement probability or of the overall relationship between rates of responding and reinforcement (i.e., molecular vs. molar level of analysis; Baum, 1973). In addition, because of the stable, reliable behaviors they produce, reinforcement schedules have been widely adopted for use in related disciplines as baseline controls (e.g., behavioral pharmacology, behavioral neuroscience).

Comparing Pavlovian and Instrumental Conditioning

Many of the phenomena identified in Pavlovian conditioning have instrumental counterparts. For example, the basic relations of acquisition as a result of response-outcome pairings and extinction as a result of nonreinforcement of the response, as well as spontaneous recovery from extinction, are found in instrumental conditioning (see Dickinson, 1980; R. R. Miller & Balaz, 1981, for more detailed comparisons). Blocking and overshadowing may be obtained for instrumental responses (St. Claire-Smith, 1979; B. A. Williams, 1982). Stimulus generalization and discrimination characterize instrumental conditioning (Guttman & Kalish, 1956). Temporal contiguity is important for instrumental conditioning; response rate decreases rapidly as the response-reinforcer delay increases, so long as an explicit stimulus does not fill the interval (e.g., B. A. Williams, 1976). If a stimulus does fill the interval, it may function as a conditioned reinforcer and acquire reinforcing power in its own right (e.g., Schaal & Branch, 1988; although under select conditions it can attenuate [i.e., overshadow] the response, e.g., Pearce & Hall, 1978). This provides a parallel to second-order Pavlovian conditioning. Latent learning, in which learning occurs in the absence of explicit reinforcement (Tolman & Honzik, 1930), is analogous to sensory preconditioning. Learned helplessness, in which a subject first exposed to inescapable shock later fails to learn an escape response (Maier & Seligman, 1976), provides a parallel to learned irrelevance. Instrumental conditioning varies directly with the response-outcome contingency (e.g., Hammond, 1980). Cue-response-consequence specificity (Foree & LoLordo, 1975) is similar to cue-toconsequence predispositions in Pavlovian conditioning (see Predispositions on p. 371). Overall, the number of parallels between Pavlovian and instrumental conditioning encourages the view that an organism’s response can function like a stimulus, and that learning fundamentally concerns the development of associative links between mental representations of events (responses and stimuli).

Associationistic Analyses of Instrumental Conditioning

Researchers have attempted to determine what kind of associations are formed in instrumental conditioning situations. From an associationistic perspective, the law of effect implies that stimulus-response (S-R) associations are all that is learned. However, this view was challenged by Tolman (1932), who argued that S-R associations were insufficient to account for instrumental conditioning. He advocated a more cognitive approach in which the organism was assumed to form expectancies about the relation between the response and outcome. Contemporary research has confirmed and elaborated Tolman’s claim, showing that in addition to S-R associations, three other types of associations are formed in instrumental conditioning: response-outcome, stimulus-outcome, and hierarchical associations.

Response-Outcome Associations

Several studies using outcome devaluation procedures have found evidence for response-outcome associations. For example, Adams and Dickinson (1981) trained rats to press a lever for one of two outcomes (food or sugar pellets, counterbalanced across groups), while the other outcome was delivered independently of responding (i.e., noncontingent). After responding had been acquired, they devalued one of the outcomes by pairing it with induced gastric distress. In a subsequent extinction test, rats for which the responsecontingent outcome had been devalued responded less compared with rats for which the noncontingent outcome had been devalued. Because the outcomes were never presented during testing, Adams and Dickinson argued that the difference in responding must have been mediated by learning of the response-outcome contingency. However, substantial residual responding was still observed for the groups with the devalued contingent outcome, leading Dickinson (1994, p. 52) to conclude that instrumental training “established lever pressing partly as a goal-directed action, mediated by knowledge of the instrumental relation, and partly as an S-R habit impervious to outcome devalution.”

Stimulus-Outcome Associations

Evidence for (Pavlovian) stimulus-outcome (S-O) associations has been obtained in studies that have shown greater transfer of stimulus control to a new response that has been trained with the same outcome than with a different outcome. Colwill and Rescorla (1988) trained rats to make a common response (nose poking) in the presence of two different stimuli (light and noise). Nose poking produced different outcomes, depending on the stimulus (food pellets or sucrose solution, counterbalanced across groups). The rats were then trained to make two new responses (lever press and chain pull), each of which produced either food or sucrose. Finally, a transfer test was conducted in which rats could choose between lever pressing and chain pulling in the presence of the light and noise stimuli. Colwill and Rescorla found that the response that led to the outcome signaled by the stimulus in the original training with the nose-poke response occurred more frequently during test. Thus, rats were more likely to make whichever response led to the outcome that had been experienced in the presence of the stimulus during the nosepoke training, which suggests they had formed stimulusoutcome associations during that training.

Hierarchical Associations

In addition to binary associations involving the stimulus, response, and outcome, there is evidence that organisms encode a hierarchical association involving all three elements. Rescorla (1991) trained rats to make two responses (lever press and chain pull) for two different outcomes (food and sucrose) in the presence of a stimulus (light or noise). Rats were also trained with the opposite response-outcome relations in the presence of a different stimulus. Subsequently, one of the outcomes was devalued by pairing with LiCl. The rats were then given a test in which they could perform either response in the presence of each of the stimuli. The result was that responding was selectively suppressed; the response that led to the devalued outcome in the presence of the particular stimulus occurred less frequently. This result cannot be explained in terms of binary associations because individual stimuli and responses were paired equally often with both outcomes. It suggests that the rats had formed hierarchical associations, which encoded each three-term contingency [i.e., S – (R-O)]. Thus, the role of instrumental discriminative stimuli may be similar to occasion setters in Pavlovian conditioning (Davidson, Aparicio, & Rescorla, 1988).

Incentive Learning

Associations between stimuli, responses, and outcomes may comprise part of what is learned in instrumental conditioning, but clearly the organism must also be motivated to perform the response. Although motivation was an important topic for the neobehaviorists of the 1930s and 1940s (e.g., Hull, 1943), the shift towards more cognitively oriented explanations of behavior in the 1960s led to a relative neglect of motivation. More recently, however, Dickinson and colleagues (see Dickinson & Balleine, 1994, for review) have provided evidence that in some circumstances, subjects must learn the incentive properties of outcomes in instrumental conditioning.

For example, Balleine (1992) trained sated rats to press a lever for a novel food item. Half of the rats were later exposed to the novel food while hungry. Subsequently, an extinction test was conducted in which half of the rats were hungry (thus generating four groups, depending on whether the rats had been preexposed to the novel food while hungry, and whether they were hungry during the extinction test). The results were that the rats given preexposure to the novel food item while hungry and tested in a deprived state responded at the highest rate during the extinction test. This suggests that exposure to the novel food while in the deprived state contributed to that food’s serving as an effective reinforcer. However, Dickinson, Balleine, Watt, Gonzalez, and Boakes (1995) found that the magnitude of the incentive learning effect diminished when subjects received extended instrumental training prior to test. Thus, motivational control of behavior may change, depending on experience with the instrumental contingency.

In summary, efforts to elucidate the nature of associative structures underlying instrumental conditioning have found evidence for all the possible binary associations (e.g., stimulusresponse, response-outcome, and stimulus-outcome), as well as for a hierarchical association involving all three elements (stimulus: response-outcome). Additionally, in some situations, whether an outcome has incentive value is apparently learned. From this perspective, it seems reasonable to assume that these associations are acquired in the same fashion as stimulus-outcome associations in Pavlovian conditioning. In this view, instrumental conditioning may be considered an elaboration of fundamental associative processes.

Functional Analyses of Instrumental Conditioning

A second approach to instrumental conditioning is derived from Skinner’s (1938) interpretation of the law of effect. Rather than construe the law literally in terms of S-R connections, Skinner interpreted the law of effect to mean only that response strength increases with reinforcement and decreases with punishment. Exactly how response strength could be measured thus became a major concern. Skinner (1938) developed an apparatus (i.e., experimental chambers called Skinner boxes and cumulative recorders) that allowed the passage of time as well as lever presses and reward deliveries to be recorded. This allowed a shift in the dependent variable from the probability of a response’s occurring on a particular trial to the rate of that response over a sustained period of time. Such procedures are sometimes called free-operant (as opposed to discrete-trial). The ability to study intensively the behavior of individual organisms has led researchers in the Skinnerian tradition to emphasize molar rather than molecular measures of responding (i.e., response rate aggregated over several sessions), to examine responding at stability (i.e., asymptote) rather than during acquisition, and to use a relatively small number of subjects in their research designs (Sidman, 1960). This research tradition, often called the experimental analysis of behavior, has led to an emphasis on various formal arrangements for instrumental conditioning— for example, reinforcement schedules and the export of technologies for effective behavior modification (e.g., SulzerAzaroff & Mayer, 1991).

Choice and the Matching Law

Researchers have attempted to quantify the law of effect by articulating the functional relationships between behavior (measured as response rate) and parameters of reinforcement (specifically, the rate, magnitude, delay, and probability of reinforcement). The goal has been to obtain a quantitative expression that summarizes these relationships and that is broadly applicable to a range of situations. Interestingly, this pursuit has been inspired by research on choice—situations in which more than one reinforced instrumental response is available at the same time.

Four experimental procedures have figured prominently in research on the quantitative determiners of instrumental responding. In the single-schedule procedure, the subject may make a specific response that produces a reinforcer according to a given schedule. In concurrent schedules, two or moreschedulesareavailablesimultaneouslyandthesubjectis free to allocate its behavior across the alternatives. In multiple schedules, access to different reinforcement schedules occurs successively, with each schedule signaled by a distinctive (discriminative) stimulus. Finally, in the concurrent-chains procedure (and a discrete-trial variant, the adjusting-delay procedure), subjects choose between two discriminative stimuli that are correlated with different reinforcement schedules.

A seminal study by Herrnstein (1961) was the first parametric investigation of concurrent schedules. He arranged two VI schedules in a Skinner box for pigeons, each schedule associated with a separate manipulandum (i.e., plastic pecking key). Reinforcement was a brief period (3 s) of access to grain. Pigeons were given extensive training (often 30 or more sessions) with a given pair of schedules (e.g., VI 1-min, VI 3-min schedules) until response allocation was stable. The schedules were then changed across a number of experimental conditions, such that the relative rate of reinforcement provided by responding to the left and right keys was varied while keeping constant the overall programmed reinforcement rate (40/hr). Herrnstein found that the relative rate of responding to each key was approximately equal to the relative rate of reinforcement associated with each key. His data, shown in Figure 13.2, demonstrate what has come to be known as the matching law:

In Equation 13.1, B_Land B_Rare the number of responses made to the left and right keys, and R_Land R_Rare the reinforcements earned by responding at those keys. Although Equation 13.1 might appear tautological, it is important to note that the matching relation was not forced in Herrnstein’s study, because responses substantially outnumbered reinforcers. Subsequent empirical support for the matching law has been obtained with a variety of different species, responses, and reinforcers, and thus it may represent a general principle of choice (for reviews, see Davison & McCarthy, 1988; B. A. Williams, 1988, 1994a). The matching law seems to embody a relativistic law of effect: The relative strength of an instrumental response depends on the relative rate of reinforcement maintaining it, which parallels the relativism evident in most expression-focused models of Pavlovian conditioning (see this research paper’s section entitled, “ExpressionFocused Models”) and probability matching in the decisionmaking literature.

Why Does Matching Occur?

Many investigators have accepted the matching relation as an empirical rule for choice under concurrent VI-VI schedules. An important goal, then, is to discover exactly why matching should occur. Because an answer to this question might provide insight into the fundamental behavioral processes determining choice, testing different theories of matching has been a vigorous topic of research over the past 35 years.

Shimp (1966, 1969) showed that if subjects always responded to the alternative with the immediate higher probability of reinforcement, then matching would be obtained. According to his theory, called momentary maximizing, responses should show a definite sequential dependence. The reason is that both schedules run concurrently, so eventually a response to the leaner alternative is more likely to be reinforced. For example, with concurrent Left Key VI 1-min, Right Key VI 3-min schedules, a response sequence of LLLR maximizes the likelihood that each response will be reinforced. To evaluate this prediction, Nevin (1969) arranged a discrete-trials concurrent VI 1-min, VI 3-min procedure. Matching to relative reinforcement rate was closely approximated, but the probability of a response to the lean (i.e., VI 3-min) schedule remained roughly constant as a function of consecutive responses made to the rich schedule. Thus, Nevin’s results demonstrate that matching can occur in the absence of sequential dependency (see also Jones & Moore, 1999).

Other studies, however, obtained evidence of a local structure in time allocation consistent with a momentary maximizing strategy (e.g., Hinson & Staddon, 1983). Although reasons for the presence or absence of this strategy are not yet clear, B. A. Williams (1992) found that, in a discrete-trials VI-VR procedure with rats as subjects, sequential dependencies consistent with momentary maximizing were found with short intertrial intervals (ITIs), but data that approximated matching without sequential dependencies were found with longer ITIs. The implication seems to be that organisms use a maximizing strategy if possible, depending on the temporal characteristics of the procedure; otherwise matching is obtained.

A second explanation for matching in concurrent schedules was offered by Rachlin, Green, Kagel, and Battalio (1976). They proposed that matching was a by-product of overall reinforcement rate maximization within a session. According to Rachlin et al., organisms are sensitive to the reinforcement obtained from both alternatives, and they distribute their responding so as to obtain the maximum overall reinforcement rate. This proposal is called molar maximizing because it assumes that matching is determined by an adaptive process that yields the outcome with the overall greatest utility for the organism (see section in this research paper entitled “Behavioral Economics”). In support of their view, Rachlin et al. presented computer simulations demonstrating that the behavior allocation yielding maximum overall reinforcement rate coincided with matching for concurrent VI schedules (cf. Heyman & Luce, 1979).

A large number of studies have evaluated predictions of matching versus molar maximizing. Several studies have arranged concurrent VI-VR schedules (e.g., Herrnstein & Heyman, 1979). To optimize overall reinforcement rate on concurrent VI-VR, subjects should spend most of their time responding on the VR schedule, occasionally switching over to the VI to obtain reinforcement. This implies that subjects should show a strong bias towards the VR schedule. However, such a bias has typically not been found. Instead, Herrnstein and Heyman (1979) reported that their subjects approximately matched without maximizing. Similar data with humans were reported by Savastano and Fantino (1994). Proponents of molar maximizing (e.g., Rachlin, Battalio, Kagel, & Green, 1981) have countered that Herrnstein and Heyman’s results can be explained in terms of the value of leisure time. When certain assumptions are made about the value of leisure and temporal discounting of delayed reinforcers, it may be difficult, if not impossible, to determine whether matching is fundamental or a by-product of imperfect maximizing (Rachlin, Green, & Tormey, 1988).

A recent experiment by Heyman and Tanz (1995) shows that under appropriate conditions, both matching and molar maximizing may characterize choice. In their experiment, pigeons were exposed to a concurrent-schedules procedure in which the overall rate of reinforcement depended on the response allocation in the recent past (last 360 responses). Heyman and Tanz found that when no stimuli were differentially correlated with overall reinforcement rates, the pigeons approximately matched rather than maximized. However, when the color of the chamber house-light signaled when response allocation was increasing the reinforcement rate, the pigeons maximized, deviating from matching apparently without limit. In other words, when provided with an analogue instructional cue, the pigeons maximized. Heyman and Tanz’s results strongly suggest that organisms maximize when they are able to do so, but match when they are not, implying that maximizing and matching are complementary rather than contradictory accounts of choice.

A third theory of matching, melioration, was proposed by Herrnstein and Vaughan (1980). The basic idea of melioration (meaning to make better) is that organisms switch their preference to whichever alternative provides the higher local reinforcement rate (i.e., the number of reinforcers earned divided by the time spent responding at the alternative). Because the local reinforcement rates change depending on how much time is allocated to the alternatives, matching is eventually obtained when the local reinforcement rates are equal. Although the time window over which local reinforcement rates are determined is left unspecified, it is understood to be a relatively brief duration (e.g., 4 min; Vaughan, 1981). Thus, melioration occupies essentially an intermediate level between momentary and molar maximizing in terms of the time scale over which the variable determining choice is calculated. Applications of melioration to human decision making have been particularly fruitful. For example, Herrnstein and Prelec (1992) proposed a model for drug addiction based on melioration, which has been elaborated by Heyman (1996) and Rachlin (1997).

Several studies have attempted to test the prediction of melioration that local reinforcement rates determine preference by arranging two pairs of concurrent schedules within each session and then testing preference for stimuli between pairs from different concurrent schedules in probe tests. For example, B. A. Williams and Royalty (1989) conducted several experiments in which probes compared stimuli correlated with different local and overall reinforcement rates. However, they found that the overall, not local, reinforcement rates correlated with stimuli-determined preference in the probes. In a similar study, Belke (1992) arranged a procedure with VI 20-s, VI 40-s schedules in one component and VI 40-s, VI 80-s schedules in the other component. After baseline training, pigeons’ preference approximately matched relative reinforcement rate in both components (i.e., a 2 : 1 ratio). Belke then presented the two VI 40-s stimuli together in occasional choice probes. The pigeons demonstrated a strong (4 : 1) preference for the VI 40-s stimulus paired with the VI 80-s. This result is contrary to the predictions of melioration, because the VI 40-s paired with VI 20-s is correlated with a greater local reinforcement rate (see also Gibbon, 1995).

Gallistel and Gibbon (2000) have argued that the results of Belke (1992) pose a serious challenge not only to melioration, but also to the matching law as empirical support for the law of effect. They described a model for instrumental choice that was based on Gibbon (1995; see also Mark & Gallistel, 1994). According to their model, pigeons learn the interreinforcement intervals for responding on each alternative and store these intervals in memory. Decisions to switch from one alternative to another are made by a sample-and-comparison process that operates on the stored intervals. They showed that their model could predict Belke’s (1992) and Gibbon’s (1995) probe results. However, these data may not be decisive evidence against melioration, or indeed against any theory of matching. According to Gallistel and Gibbon, when separately trained stimuli are paired in choice probes, the same changeover patterns that were established in baseline training to particular stimuli are carried over. If carryover of baseline can account for probe preference, then the probes provide no new information beyond baseline responding. The implication is that any theory that can account for matching in baseline can potentially explain the probe results of Belke (1992) and Gibbon (1995).

Extensions of the Matching Law

Generalized Matching. Since Herrnstein’s (1961) original study, the matching law has been extended in several ways to provide a quantitative framework for describing data from various procedures. Baum (1974) noted that some deviations from the strict equality of response and reinforcement ratios required by the matching law could be described by Equation 13.2, a power function generalization of Equation 13.1:

Equation 13.2 is known as the generalized matching law. There are two parameters: bias (b), which represents a constant proportionality in responding unrelated to reinforcement rate (e.g., position preference); and an exponent (a), which represents sensitivity to reinforcement rate. Typically, a logarithmic transformation of Equation 13.2 is used, resulting in a linear relation in which sensitivity and bias correspond to the slope and intercept, respectively. Baum (1979) reviewed over 100 data sets and found that the generalized matching law commonly accounted for over 90% of the variance in behavior allocation (for a review of comparable human research, see Kollins, Newland, & Critchfield, 1997). Thus, in the generalized form represented in Equation 13.2, the matching law provides an excellent description of choice in concurrent schedules. Although undermatching (i.e., a < 1) is the most common result, this may result from a variety of factors, including imperfect discriminability of the contingencies (Davison & Jenkins, 1985).

Matching in Single and Multiple Schedules. If the law of effect is a general principle of behavior, and the matching law is a quantitative expression of the law of effect, then the matching principle should apply to situations other than concurrent schedules. Herrnstein (1970) proposed an extension of the matching law that applied to single and multiple schedules. His starting point was Catania and Reynolds’ (1968) data showing that response rate was an increasing, negatively accelerated function of reinforcement rate on single VI schedules (see Figure 13.3).

Herrnstein (1970) reasoned that when a single schedule was arranged, a variety of behaviors other than the target response were available to the organism (e.g., grooming, pacing, defecating, contemplation). Presumably, these so-called extraneous behaviors were maintained by extraneous (i.e., unmeasured) reinforcers. Herrnstein then made two assumptions: (a) that the total amount of behavior in any situation was constant—that is, the frequencies of target and extraneous behaviors varied inversely; and (b) that “all behavior is choice” and obeys the matching law. The first assumption implies that the target and extraneous response rates sum to a constant (B + Be = k), and are maintained by rates of scheduled and extraneous reinforcement (R and Re), respectively. Based on the second assumption,

Equation 13.3 defines a hyperbola, with two parameters, k and Re. The denominator represents the context of reinforcement for a particular response—the total amount of reinforcement in the situation. De Villiers and Herrnstein (1976) fit Equation 13.3 to a large number of data sets and found that it generally gave an excellent description of response rates under VI schedules. Subsequent research has generally confirmed the hyperbolic relation between response rate and reinforcement rate, although lower-than-predicted response rates are sometimes observed at very high reinforcement rates (Baum, 1993). In addition, Equation 13.3 has been derived from a number of different theoretical perspectives (Killeen, 1994; McDowell & Kessel, 1979; Staddon, 1977).

Herrnstein (1970) also developed a version of the matching law that was applicable to multiple schedules. In a multiple schedule, access to two (or more) different schedules occur successively and are signaled by discriminative stimuli. A well-known result in multiple schedules is behavioral contrast: Response rate in a component that provides a constant rate of reinforcement varies inversely with the reinforcement rate in the other component (see B. A. Williams, 1983, for review). Herrnstein suggested that the reinforcement rate in the alternative component served as part of the reinforcement context for behavior in the constant component. However, the contribution of alternative component reinforcement was attenuated by a parameter (m), which describes the degree of interaction at a temporal distance,

with subscripts referring to the components of the multiple schedule. Equation 13.4 correctly predicts most behavioral contrast, but has difficulties with some other phenomena (see McLean & White, 1983, for review). Alternative models for multiple-schedule performance also based on the matching law have been proposed that alleviate these problems (McLean, 1995; McLean & White, 1983; B. A. Williams & Wixted, 1986).

Matching to Relative Value. The effects of variables other than reinforcement rate were examined in several early studies, which found that response allocation in concurrent schedules obeyed the matching relation when magnitude (i.e., seconds of access to food; Catania, 1963) and delay of reinforcement (Chung & Herrnstein, 1967) were varied. Baum and Rachlin (1969) then proposed that the matching law might apply most generally to reinforcement value, with value being defined as a multiplicative combination of reinforcement parameters,

with M being reinforcement magnitude, D being delay, and V being value.

Equation 13.5 represents a significant extension of the matching law, enabling it to apply to a broader range of choice situations (note that frequently a generalized version of Equation 13.5 with exponents, analogous to Equation 13.2, has been used here; e.g., Logue, Rodriguez, PenaCorreal, & Mauro, 1984). One of the most important of these is self-control, which has been a major focus of research because of its obvious relevance for human behavior. In a self-control situation, subjects are confronted with a choice between a small reinforcer available immediately (or after a short delay), and a larger reinforcer available after a longer delay. Typically, overall reinforcement gain is maximized by choosing the delayed, larger reinforcer, which is defined as self-control (Rachlin & Green, 1972; see Rachlin, 1995, for review). By contrast, choice of the smaller, less delayed reinforcer is termed impulsivity. For example, if pigeons are given a choice between a small reinforcer (2-s access to grain) delayed by 1 s or a large reinforcer (6-s access to grain) delayed by 6 s, then Equation 13.5 predicts that 67% of the choice responses will be for the small reinforcer (i.e., the 6:1 delay ratio is greater than the 2:6 magnitude ratio). However, if the delays to both the small and large reinforcers are increased by the same amount, then Equation 13.5 predicts a reversal of preference. For example, if the delays are both increased by 10 s, then predicted preference for the small reinforcer is only 33% (16:11 delay ratio is no longer enough to compensate for the 2:6 magnitude ratio). Empirical support for such preference reversals has been obtained in studies of both human and nonhuman choice (Green & Snyderman, 1980; Kirby & Herrnstein, 1995). These data suggest that the temporal discounting function—that is, the function that relates the value of a reward to its delay—is not exponential, as assumed by normative economic theory, but rather hyperbolic in form (Myerson & Green, 1995).

Choice Between Stimuli of Acquired Value

Concurrent Chains. A more complex procedure that has been widely used in research on choice is concurrent chains, which is a version of concurrent schedules in which responses are reinforced not by food but by stimuli that are correlated with different schedules of food reinforcement. In concurrent chains, subjects respond during a choice phase (initial links) to obtain access to one of two reinforcement schedules (terminal links). The stimuli that signal the onset of the terminal links are analogous to Pavlovian CSs and are often called conditioned reinforcers, as their potential to reinforce initial-link responding derives from a history of pairing with food. Conditioned reinforcement has been a topic of long-standing interest because it is recognized that many of the reinforcers that maintain human behavior (e.g., money) are not of inherent biological significance (see B. A. Williams, 1994b, for review). Preference in the initial links of concurrent chains is interpreted as a measure of the relative value of the schedules signaled by the terminal links.

Herrnstein (1964) found that ratios of initial-link response rates matched the ratios of reinforcement rates in the terminal links, suggesting that the matching law might be extended to concurrent chains. However, subsequent studies showed that the overall duration of the initial and terminal links—the temporal context of reinforcement—affected preference in ways not predicted by the matching law. To account for these data, Fantino (1969) proposed the delay-reduction hypothesis, which states that the effectiveness of a terminal-link stimulus as a conditioned reinforcer depends on the reduction in delay to reinforcement signaled by the terminal link. According to Fantino’s model, the value of a stimulus depends inversely on the reinforcement context in which it occurs (i.e., value is enhanced by a lean context, and vice versa). Fantino (1977) showed that the delay-reduction hypothesis provided an excellent qualitative account of preference in concurrent chains. Moreover, there is considerable evidence for the generality of the temporal context effects predicted by the model, as shown by the delay-reduction hypothesis’s having been extended to a variety of different situations (see Fantino, Preston, & Dunn, 1993, for a review).

Preference for Variability, Temporal Discounting, and the Adjusting-Delay Procedure. Studies with pigeons and rats have consistently found evidence of preference for variability in reinforcement delays: Subjects prefer a VI terminal link in concurrent chains over an FI terminal link that provides the same average reinforcement rate. This implies that animals are risk-prone when choosing between different reinforcement delays (e.g., Killeen, 1968). Interestingly, when given a choice between a variable or fixed amount of food, animals are often risk-averse, although this preference appears to be modulated by deprivation level as predicted by risk-sensitive foraging theory from behavioral ecology (see Kacelnik & Bateson, 1996, for a review). For example, Caraco, Martindale, and Whittam (1980) found that juncos’ preference for a variable versus constant number of seeds increased when food deprivation was greater.

Mazur (1984) introduced an adjusting-delay procedure that has become widely used to study preference for variability. His procedure is similar to concurrent chains in that the subject chooses between two stimuli that are correlated with different delays to reward, but the dependent variable is an indifference point—a delay to reinforcement that is equally preferred to a particular schedule. Mazur determined fixeddelay indifference points for a series of variable-delay schedules, and found that the following model (Equation 13.6) gave an excellent account of his results:

In Equation 13.6, V is the conditioned value of the stimulus that signals a delay to reinforcement, d₁, . . . , d_n, and K is a sensitivity parameter. Equation 13.6 is called the hyperbolicdecay model because it assumes that the value of a delayed reinforcer decreases according to a hyperbola (see Figure 13.4). The hyperbolic-decay model has become the leading behavioral model of temporal discounting, and has been extensively applied to human choice between delayed rewards (e.g., Kirby, 1997).

General Models for Choice

Recently, several general models for choice have been proposed. These models may be viewed as extensions of the matching law, and they are integrative in the sense that they provide a quantitative description of data from a variety of choice procedures. Determining the optimal choice model may have important implications for a variety of issues, including how conditioned value is influenced by parameters of reinforcement, as well as the nature of the temporal discounting function.

Grace (1994, 1996) showed how the temporal context effects predicted by Fantino’s delay-reduction theory could be incorporated in an extension of the generalized matching law. His contextual choice model can describe choice in concurrent schedules, concurrent chains, and the adjusting-delay procedure, on average accounting for over 90% of the variance in data from these procedures. The success of Grace’s model as applied to the nonhuman-choice data suggests that temporal discounting may be best described in terms of a model with a power function component; moreover, such a model accounts for representative human data at least as well as the hyperbolic-decay model does (Grace, 1999). However, Mazur (2001) has recently proposed an alternative model based on the hyperbolic-decay model. Mazur’s hyperbolic value-addition model is based on a principle similar to delayreduction theory, and it provides an account of the data of comparable accuracy to that of Grace’s model. Future research will determine which of these models (or whether an entirely different model) provides the best overall account of behavioral choice and temporal discounting.

Resistance to Change: An Alternative View of Response Strength

Although response rate has long been considered the standard measure of the strength of an instrumental response, it is not without potential problems. Response strength represents the product of the conditioning process. In terms of the law of effect, it should vary directly with parameters that correspond to intuitive notions of hedonic value. For example, response strength should be a positive function of reinforcement magnitude. However, studies have found that response rate often decreases with increases in magnitude (Bonem & Crossman, 1988). In light of this and other difficulties, researchers have sought other measures of response strength that are more consistently related to intuitive parameters of reinforcement.

One such alternative measure is resistance to change. Nevin (1974) conducted several experiments in which pigeons responded in multiple schedules. After baseline training, he disrupted responding in both components by either home-cage prefeeding or extinction. He found that responding in the component that provided the relatively richer reinforcement—in terms of greater rate, magnitude, or immediacy of reinforcement—decreased less compared with baseline responding for that component than did responding in the leaner component. Based on these results and others, Nevin and his colleagues have proposed behavioral momentum theory, which holds that resistance to change and response rate are independent aspects of behavior analogous to mass and velocity in classical physics (Nevin, Mandell, & Atak, 1983). According to this theory, reinforcement increases a mass-like aspect of behavior which can be measured as resistance to change.

From a procedural standpoint, the components in multiple schedules resemble terminal links in concurrent chains because differential conditions of reinforcement are signaled by distinctive stimuli and are available successively. Moreover, the same variables (e.g., reinforcement rate, magnitude, and immediacy) that increase resistance to change also increase preference in concurrent chains (Nevin, 1979). Nevin and Grace (2000) proposed an extension of behavioral momentum theory in which behavioral mass (measured as resistance to change) and value (measured as preference in concurrent chains) are different expressions of a single construct representing the reinforcement history signaled by a particular stimulus. Their model describes how stimulus-reinforcer (i.e., Pavlovian) contingencies determine the strength of an instrumental response, measured as resistance to change. Thus, it complements Herrnstein’s (1970) quantitative law of effect, which describes how response strength measured as response rate depends on response-reinforcer (i.e., instrumental) contingencies.

Ecological-Economic Analyses of Instrumental Conditioning

A third approach towards the study of instrumental behavior was inspired by criticisms of the apparent circularity of the law of effect: If a reinforcer is identified solely through its effects on behavior, then there is no way to predict in advance what outcomes will serve as reinforcers (Postman, 1947). Meehl (1950) suggested that this difficulty could be overcome if reinforcers were transsituational; an outcome identified as a reinforcer in one situation should also act as a reinforcer in other situations. However, Premack (1965) demonstrated experimentally that transsituationality could be violated. Central to Premack’s analysis is the identification of the reinforcer with the consummatory response, and the importance of obtaining a free-operant baseline measure of allocation among different responses. His results led to several important reconceptualizations of instrumental behavior, which emphasize the wider ecological or economic context of reinforcement in which responding—both instrumental (e.g., lever press) and contingent (e.g., eating)—occurs. According to this view, reinforcement is considered to be a molar adaptation to the constraints imposed by the instrumental contingency, rather than a molecular strengthening process as implied by the law of effect. Two examples of such reconceptualizations are behavior regulation theory and behavioral economics.

Behavior Regulation

Timberlake and Allison (1974) noted that the increase in responding associated with reinforcement occurred only if the instrumental contingency required that the animal perform more of the instrumental response in order to restore the level of the contingent (consummatory) response to baseline levels. For example, consider a situation in which a rat is allowed free access to a running wheel and drinking tube during baseline. After recording the time allocated to these activities when both were freely available, a contingency is imposed such that running and drinking must occur in a fixed proportion (e.g., 30 s of running gives access to a brief period of drinking). If the rat continued to perform both responses at baseline levels, it would spend far less time drinking—a condition Timberlake and Allison (1974) termed response deprivation. Because of the obvious physiological importance of water intake, the solution is for the rat to increase its rate of wheel running so as to maintain, as far as possible, its baseline level of drinking. Thus, reinforcement is viewed as an adaptive response to environmental constraint.

According to behavior regulation theory (Timberlake, 1984), there is an ideal combination of activities in any given situation, which can be assessed by an organism’s baseline allocation of time across all possible responses. The allocation defines a set point in a behavior space. The determiners of set points may be complex and depend on the feeding ecology of the particular species (e.g., Collier, Hirsch, & Hamlin, 1972). The effect of the instrumental contingency is to constrain the possible allocations in the behavior space. For example, the reciprocal ratio contingency between running and drinking previously described implies that the locus of possible allocations is a straight line in the two-dimensional behavior space (i.e., running and drinking). If the set point is no longer possible under the contingency, the organism adjusts its behavior so as to minimize the distance between obtained allocation and the set point. Similar regulatory theories have been proposed byAllison, Miller, and Wozny (1979), Staddon (1979), and Rachlin and Burkhard (1978). Although regulatory theories have been very successful at describing instrumental performance at the molar level, they have proven somewhat controversial. Forexample, the critical role of deviations from the set point seems to imply that organisms are able to keep track of potentially thousands of different responses made during the session, and able to adjust their allocation accordingly. Opponents of regulatory theories (e.g., see commentaries following Timberlake, 1984) claim this is unlikely and that the effects of reinforcement are better understood at a more molecular level. Perhaps the most likely outcome of this debate is that molar and molecular accounts of instrumental behavior will prove complementary, not contradictory.

Behavioral Economics

An alternative interpretation of set points is that they represent the combination of activities with the highest subjective value or utility to the organism (e.g., so-called bliss points). One of the fundamental assumptions of economic choice theory is that humans maximize utility when allocating their resources among various commodities. Thus, perhaps it is not surprising that economics would prove relevant for the study of instrumental behavior. Indeed, over the last 25 years researchers have systematically applied the concepts of microeconomic theory to laboratory experiments with both human and nonhuman subjects. The result has been the burgeoning field of behavioral economics (for review, see Green & Freed, 1998). Here, we consider the application of two important economic concepts—demand and substitutability— to instrumental behavior.

Demand. In economics, demand is the amount of a commodity that is purchased at a given price. The extent to which consumption changes as a function of price is a demand curve. When consumption of a particular commodity shows little or no change when its price is increased, demand is said to be inelastic. Conversely, elastic demand refers to a situation in which consumption falls with increases in price. Researchers have studied elasticity of demand in nonhumans by manipulating price in terms of reinforcement schedules. For instance, if rats’ lever pressing is reinforced with food according to an FR 10 schedule, changing the schedule to FR 100 represents an increase in price.

For example, Hursh and Natelson (1981) trained rats to press a lever for food reinforcement; a second lever was also available that produced a train of pulses to electrodes implanted in the lateral hypothalamus (ESB). Responses to both levers were reinforced by concurrent (and equal) VI, VI schedules. As the VI schedule values were increased, consumption of food remained constant, whereas the number of ESB reinforcers earned decreased dramatically (see Figure 13.5). Thus, demand for food was inelastic, whereas demand for ESB was highly elastic. In economic terms, differences in elasticity can be used to identify necessities (i.e., food) and luxuries (i.e., ESB).

Substitutability. Another concept from economics that has proven useful for understanding instrumental behavior is substitutability. In Herrnstein’s (1961) original research leading to the matching law and in many subsequent studies (see this research paper’s section titled “Choice and the Matching Law”), the reinforcers delivered by the concurrently available alternatives were identical and therefore perfectly substitutable. However, organisms must often choose between alternatives that are qualitatively different and perhaps not substitutable. In economics, substitutability is assessed by determining how the consumption of a given commodity changes when the price of another commodity is increased. To the extent that the commodities are substitutable, consumption should increase. For example, Rachlin et al. (1976) trained rats to press two levers for liquid reinforcement (root beer, or a nonalcoholic Tom Collins mix) on concurrent FR 1, FR 1 schedules. Rats were given a budget of 300 lever presses that they could allocate to either lever. Baseline consumption for one rat is shown in the left panel of Figure 13.6, together with the budget line (heavy line) indicating the possible range of choices that the rat could make. Rachlin et al. then doubled the price of root beer (by reducing the amount of liquid per reinforcer) while cutting the price of the Tom Collins mix in half (by increasing the amount). Simultaneously they increased the budget of lever presses so that rats could still obtain the same quantity of each reinforcer as in baseline. Under these conditions, the rats increased their consumption of Tom Collins mix relative to root beer. Next, the investigators cut the price of root beer in half and doubled the price of Tom Collins mix, and the rats increased consumption of root beer. This shows that root beer and Tom Collins mix were highly substitutable as reinforcers; rats’ choice shifted towards whichever commodity was cheaper. In a second baseline condition, the rats chose between food and water. Rachlin et al. then increased the price of food by 67% by reducing the number of pellets per reinforcer. Again the budget of lever presses was increased so that the rats could continue to earn the same quantities as in baseline. However, as the right panel of Figure 13.6 shows, increasing the price of food had little effect on consumption. Although water was now relatively cheaper, the rats continued to earn approximately the same amount of food, demonstrating that food and water are nonsubstitutable as reinforcers. Thus, the concept of substitutability is useful for understanding choice between qualitatively different reinforcers, as it helps to specify how allocation will shift when the instrumental contingencies (i.e., prices) are changed.

Summary

As noted in the introduction to this section, Thorndike’s pioneering studies with cats in puzzle boxes were the first systematic investigation of instrumental conditioning. Research on instrumental conditioning since then may be viewed as attempts to understand the empirical generalization of positive reinforcement that Thorndike expressed as the law of effect. The associationistic tradition (see this research paper’s section titled “Associative Analyses of Instrumental Conditioning”) describes the content of learning in instrumental situations in terms of associations that develop according to similar processes as Pavlovian conditioning. The experimental analysis of behavior (see this research paper’s section titled “Functional Analyses of Instrumental Conditioning”), derived from the work of B. F. Skinner, represents a more functional approach and attempts to describe the relations between behavior and its environmental determiners, often in quantitative terms. A third perspective is offered by research that has emphasized the importance of the wider ecological or economic context of the organism for understanding instrumental responding (see this research paper’s section titled “Ecological-Economic Analyses of Instrumental Conditioning”). These research traditions illuminate different aspects of instrumental behavior and demonstrate the richness and continued relevance of the apparently simple contingencies first studied by Thorndike over a century ago.

Conclusions

The study of learning and conditioning—basic information processing—is less in the mainstream of psychology today than it was 30–50 years ago. Yet progress continues, and there are unanswered questions of considerable importance to many other endeavors, including treatment of psychopathology (particularly behavior modification), behavioral neuroscience, and education, to name but a few. New animal models of psychopathology are the starting points of most new forms of therapeutic psychopharmacology. In behavioral neuroscience, researchers are attempting to identify the neural substrates of behavior. Surely this task demands an accurate description of the behavior to be explained. Thus, the study of basic behavior sets the agenda for much of neuroscience. Additionally, the study of basic learning and information processing has many messages for educators. For example, research has repeatedly demonstrated that distractor events, changes in context during training, and spacing of training trials all attenuate the rate at which behavior is initially altered. But these very procedures also result in improved retention over time and better transfer to new test situations. These are but a few of the continuing contributions stemming from the ongoing investigation of the principles of learning and basic information processing.

Bibliography:

Adams, C. D., & Dickinson, A. (1981). Instrumental responding following reinforcer devaluation. Quarterly Journal of Experimental Psychology, 33B, 109–122.
Allison, J., Miller, M., & Wozny, M. (1979). Conservation in behavior.JournalofExperimentalPsychology:General,108,4–34.
Ayres, J. J. B. (1998). Fear conditioning and avoidance. In W. T. O’Donohue (Ed.), Learning and behavior therapy (pp. 122– 145). Boston: Allyn & Bacon.
Azorlosa, J. L., & Cicala, G. A. (1988). Increased conditioning in rats to a blocked CS after the first compound trial. Bulletin of the Psychonomic Society, 26, 254–257.
Baker, A. G., & Mackintosh, N. J. (1977). Excitatory and inhibitory conditioning following uncorrelated presentations of the CS and UCS. Animal Learning & Behavior, 5, 315–319.
Balaz, M. A., Gutsin, P., Cacheiro, H., & Miller, R. R. (1982). Blocking as a retrieval failure: Reactivation of associations to a blocked stimulus. Quarterly Journal of Experimental Psychology, 34B, 99–113.
Balaz, M. A., Kasprow, W. J., & Miller, R. R. (1982). Blocking with a single compound trial. Animal Learning & Behavior, 10, 271–276.
Balleine, B. (1992). Instrumental performance following a shift in primary motivation depends upon incentive learning. Journal of Experimental Psychology: Animal Behavior Processes, 18, 236–250.
Balsam, P. D. (1984). Relative time in trace conditioning. In J. Gibbon & L. Allan (Eds.), Annals of the New York Academy of Sciences: Timing and Time Perception (Vol. 243, pp. 211–227). Cambridge, MA: Ballinger.
Batson, J. D., & Batsell, W. R., Jr. (2000). Augmentation, not blocking, in an A+/AX+ flavor-conditioning procedure. Psychonomic Bulletin & Review, 7, 466–471.
Baum, W. M. (1973). The correlation-based law of effect. Journal of the Experimental Analysis of Behavior, 20, 137–153.
Baum, W. M. (1974). On two types of deviation from the matching law: Bias and undermatching. Journal of the Experimental Analysis of Behavior, 22, 231–242.
Baum, W. M. (1979). Matching, undermatching, and overmatching in studies of choice. Journal of the Experimental Analysis of Behavior, 32, 269–281.
Baum, W. M. (1993). Performances on ratio and interval schedules of reinforcement: Data and theory. Journal of the Experimental Analysis of Behavior, 59, 245–264.
Baum, W. M., & Rachlin, H. C. (1969). Choice as time allocation. Journal of the Experimental Analysis of Behavior, 12, 861–874.
Belke, T. W. (1992). Stimulus preference and the transitivity of preference. Animal Learning & Behavior, 20, 401–406.
Bennett, C. H., Wills, S. J., Oakeshott, S. M., & Mackintosh, N. J. (2000). Is the context specificity of latent inhibition a sufficient explanation of learned irrelevance? Quarterly Journal of Experimental Psychology, 53B, 239–253.
Berkeley, G. (1946). A treatise concerning the principles of human knowledge. La Salle, IL: Open Court Publication Co. (Reprinted from 1710, Dublin, Ireland: Jeremy Pepyat)
Best, M. R., Dunn, D. P., Batson, J. D., Meachum, C. L., & Nash, S. M. (1985). Extinguishing conditioned inhibition in flavouraversion learning: Effects of repeated testing and extinction of the excitatory element. Quarterly Journal of Experimental Psychology, 37B, 359–378.
Bonardi, C., & Hall, G. (1996). Learned irrelevance: No more than the sum of CS and US preexposure effects? Journal of Experimental Psychology: Animal Behavior Processes, 22, 183–191.
Bonem, M., & Crossman, E. K. (1988). Elucidating the effects of reinforcement magnitude. Psychological Bulletin, 104, 348–362.
Bouton, M. E., & Bolles, R. C. (1979). Contextual control of the extinction of conditioned fear. Learning and Motivation, 10, 445–466.
Breland, K., & Breland, M. (1961). The misbehavior of organisms. American Psychologist, 16, 681–684.
Brogden, W. J. (1939). Sensory pre-conditioning. Journal of Experimental Psychology, 25, 323–332.
Brooks, D. C., & Bouton, M. E. (1993). A retrieval cue for extinction attenuates spontaneous recovery. Journal of Experimental Psychology: Animal Behavior Processes, 19, 77–89.
Burger, D. C., Mallemat, H., & Miller, R. R. (2000). Overshadowing of subsequent events and recovery thereafter. Quarterly Journal of Experimental Psychology, 53B, 149–171.
Burns, M., & Domjan, M. (1996). Sign tracking versus goal tracking in the sexual conditioning of male Japanese quail (Coturnix japonica). Journal of Experimental Psychology: Animal Behavior Processes, 22, 297–306.
Bush, R. R., & Mosteller, F. (1951). A mathematical model for simple learning. Psychological Review, 58, 313–323.
Caraco, T., Martindale, S., & Whittam, T. S. (1980). An empirical demonstration of risk-sensitive foraging pBibliography:. Animal Behaviour, 28, 820–830.
Catania, A. C. (1963). Concurrent performances: A baseline for the study of reinforcement magnitude. Journal of the Experimental Analysis of Behavior, 6, 299–300.
Catania, A. C., & Reynolds, G. S. (1968). A quantitative analysis of responding maintained by interval schedules of reinforcement. Journal of the Experimental Analysis of Behavior, 11, 327–383.
Cheatle, M. D., & Rudy, J. W. (1978). Analysis of second-order odor-aversion conditioning in neonatal rats: Implications for Kamin’s blocking effect. Journal of Experimental Psychology: Animal Behavior Processes, 4, 237–249.
Cheng, P. W., & Novick, L. R. (1992). Covariation in natural causal induction. Psychological Review, 99, 365–382.
Chung, S.-H., & Herrnstein, R. J. (1967). Choice and delay of reinforcement. Journal of the Experimental Analysis of Behavior, 10, 67–74.
Clarke, J. C., Westbrook, R. F., & Irwin, J. (1979). Potentiation instead of overshadowing in the pigeon. Behavioral and Neural Biology, 25, 18–29.
Clayton, N. S., & Dickinson, A. (1999). Scrub jays (Aphelocoma coerulescens) remember the relative time of caching as well as the location and content of their caches. Journal of Comparative Psychology, 113, 403–416.
Cole, M. R. (1999). Molar and molecular control in variableinterval and variable-ratio schedules. Journal of the Experimental Analysis of Behavior, 71, 319–328.
Collier, G., Hirsch, E., & Hamlin, P. H. (1972). The economic determinants of reinforcement in the rat. Physiology and Behavior, 9, 705–716.
Colwill, R. M., & Rescorla, R. A. (1988). Associations between the discriminative stimulus and the reinforcer in instrumental learning. Journal of Experimental Psychology: Animal Behavior Processes, 14, 155–164.
Cosmides, K., & Tooby, J. (1994). Origins of domain-specificity: The evolution of functional organization. In L. A. Hirschfeld & S. A. Gelman (Eds.), Mapping the mind: Domain specificity in cognition and culture (pp. 85–116). New York: Cambridge University Press.
Dalrymple, A. J., & Galef, B. G. (1981). Visual discrimination pretraining facilitates subsequent visual cue/toxicosis conditioning in rats. Bulletin of the Psychonomic Society, 18, 267–270.
Davidson, T. L., Aparicio, J., & Rescorla, R. A. (1988). Transfer between Pavlovian facilitators and instrumental discriminative stimuli. Animal Learning & Behavior, 16, 285–291.
Davison, M., & Jenkins, P. E. (1985). Stimulus discriminability, contingency discriminability, and schedule performance. Animal Learning & Behavior, 13, 77–84.
Davison, M. & McCarthy, D. (1988). The matching law: A research review. Hillsdale, NJ: Erlbaum.
Dearing, M. F., & Dickinson, A. (1979). Counterconditioning of shock by a water reinforcer in rabbits. Animal Learning & Behavior, 7, 360–366.
Denniston, J. C., Blaisdell, A. P., & Miller, R. R. (1998). Temporal coding affects transfer of serial and simultaneous inhibitors. Animal Learning & Behavior, 26, 336–350.
Denniston, J. C., Miller, R. R., & Matute, H. (1996). Biological significance as a determinant of cue competition. Psychological Science, 7, 235–331.
Denniston, J. C., Savastano, H. I., & Miller, R. R. (2001). The extended comparator hypothesis: Learning by contiguity, responding by relative strength. In R. R. Mowrer & S. B. Klein (Eds.), Handbook of contemporary learning theories (pp. 65–117). Hillsdale, NJ: Erlbaum.
de Villiers, P. A., & Herrnstein, R. J. (1976). Toward a law of response strength. Psychological Bulletin, 33, 1131–1153.
DeVito, P. L., & Fowler, H. (1987). Enhancement of conditioned inhibition via an extinction treatment. Animal Learning & Behavior, 15, 448–454.
Dickinson, A. (1980). Contemporary animal learning theory. Cambridge, UK: Cambridge University Press.
Dickinson,A.(1994).Instrumentalconditioning.InN.J.Mackintosh (Ed.), Animal learning and cognition (pp. 45–79). New York: Academic Press.
Dickinson, A., & Balleine, B. (1994). Motivational control of goaldirected action. Animal Learning & Behavior, 22, 1–18.
Dickinson, A., Balleine, B., Watt, A., Gonzalez, F., & Boakes, R. A. (1995). Motivational control after extended instrumental training. Animal Learning & Behavior, 23, 197–206.
Dickinson, A., & Burke, J. (1996). Within-compound associations mediate the retrospective revaluation of causality judgments. Quarterly Journal of Experimental Psychology, 49B, 60–80.
Dickinson, A., & Charnock, D. J. (1985). Contingency effects with maintained instrumental reinforcement. Quarterly Journal of Experimental Psychology, 37B, 397–416.
Dinsmoor, J. A. (1998). Punishment. In W. T. O’Donohue (Ed.), Learning and behavior therapy (pp. 188–204). Boston: Allyn & Bacon.
Domjan, M. (1983). Biological constraints on instrumental and classical conditioning: Implications for general process theory. In G. H. Bower (Ed.), The psychology of learning and motivation (Vol. 17, pp. 215–277). New York: Academic Press.
Domjan, M., & Hollis, K. L. (1988). Reproductive behavior: A potential model system for adaptive specializations in learning. In R. C. Bolles & M. D. Beecher (Eds.), Evolution and learning (pp. 213–237). Hillsdale, NJ: Erlbaum.
Durlach, P. J., & Rescorla, R. A. (1980). Potentiation rather than overshadowing in flavor-aversion learning: An analysis in terms of within-compound associations. Journal of Experimental Psychology: Animal Behavior Processes, 6, 175–187.
Egger, M. D., & Miller, N. E. (1963). When is a reward reinforcing? An experimental study of the information hypothesis. Journal of Comparative and Physiological Psychology, 56, 132–137.
Eikelboom, R., & Stewart, J. (1982). Conditioning of drug-induced psychological responses. Psychological Review, 89, 507–528.
Escobar, M., Arcediano, F., & Miller, R. R. (2001). Conditions favoring retroactive interference between antecedent events and between subsequent events. Psychonomic Bulletin & Review, 8, 691–697.
Escobar, M., Matute, H., & Miller, R. R. (2001). Cues trained apart compete for behavioral control in rats: Convergence with the associative interference literature. Journal of Experimental Psychology: General, 130, 97–115.
Estes, W. K. (1950). Toward a statistical theory of learning. Psychological Review, 57, 94–170.
Estes, W. K., & Burke, C. J. (1953). A theory of stimulus variability in learning. Psychological Review, 60, 276–286.
Etienne, A. S., Berlie, J., Georgakopoulos, J., & Maurer, R. (1998). Role of dead reckoning in navigation. In S. Healy (Ed.), Spatial representation in animals (pp. 54–68). Oxford, UK: Oxford.
Fantino, E. (1969). Choice and rate of reinforcement. Journal of the Experimental Analysis of Behavior, 12, 723–730.
Fantino, E. (1977). Conditioned reinforcement: Choice and information. In W. K. Honig & J. E. R. Staddon (Eds.), Handbook of operant behavior (pp. 313–339). Englewood Cliffs, NJ: Prentice-Hall.
Fantino, E., Preston, R. A., & Dunn, R. (1993). Delay reduction: Current status. Journal of the Experimental Analysis of Behavior, 60, 159–169.
Ferster, C. B., & Skinner, B. F. (1957). Schedules of reinforcement. New York: Appleton-Century-Crofts.
Foree, D. D., & LoLordo, V. M. (1975). Stimulus-reinforcer interactions in the pigeon: The role of electric shock and the avoidance contingency. Journal of Experimental Psychology: Animal Behavior Processes, 1, 39–46.
Gallistel, C. R., & Gibbon, J. (2000). Time, rate and conditioning. Psychological Review, 107, 219–275.
Garcia, J., & Koelling, R. A. (1966). Relation of cue to consequence in avoidance learning. Psychonomic Science, 4, 123–124.
Garcia, J., Ervin, F. R., & Koelling, R. A. (1966). Learning with prolonged delay of reinforcement. Psychonomic Science, 5, 121–122.
Garcia, J., Lasiter, P. S., Bermudez-Rattoni, F., & Deems, D. A. (1985). A general theory of aversion learning. Annals of the New York Academy of Sciences, 443, 8–21.
Gibbon, J. (1995). Dynamics of time matching: Arousal makes better seem worse. Psychonomic Bulletin & Review, 2, 208–215.
Gibbon, J., & Balsam, P. (1981). Spreading association in time. In C. M. Locurto, H. S. Terrace, & J. Gibbon (Eds.), Autoshaping and conditioning theory (pp. 219–253). New York: Academic Press.
Gibbon, J., Baldock, M. D., Locurto, C., Gold, L., & Terrace, H. S. (1977). Trial and intertrial durations in autoshaping. Journal of Experimental Psychology: Animal Behavior Processes, 3, 264–284.
Grace, R. C. (1994). A contextual model of concurrent-chains choice. Journal of the Experimental Analysis of Behavior, 61, 113–129.
Grace, R. C. (1996). Choice between fixed and variable delays to reinforcement in the adjusting-delay procedure and concurrent chains. Journal of Experimental Psychology: Animal Behavior Processes, 22, 362–383.
Grace, R. C. (1999). The matching law and amount-dependent exponential discounting as accounts of self-control choice. Journal of the Experimental Analysis of Behavior, 71, 27–44.
Green, L., & Freed, D. E. (1998). Behavioral economics. In W. O’Donohue (Ed.), Learning and behavior therapy (pp. 274– 300). Boston: Allyn & Bacon.
Green, L., & Snyderman, M. (1980). Choice between rewards differing in amount and delay: Toward a choice model of self control. Journal of the Experimental Analysis of Behavior, 34, 135–147.
Groves, P. M., & Thompson, R. F. (1970). Habituation: A dualprocess theory. Psychological Review, 77, 419–450.
Guthrie, R. (1935). The psychology of learning. New York: Harper.
Guthrie, E. R. (1938). The psychology of human conflict. New York: Harper.
Guttman, N., & Kalish, H. I. (1956). Discriminability and stimulus generalization. Journal of Experimental Psychology, 51, 79–88.
Gwinn, G. T. (1949). The effects of punishment on acts motivated by fear. Journal of Experimental Psychology, 39, 260–269.
Hall, G. (1991). Perceptual and associative learning. Oxford, UK: Oxford University Press.
Hall, G. (1996). Learning about associatively activated stimulus representations: Implications for acquired equivalence in perceptual learning. Animal Learning & Behavior, 24, 233–255.
Hallam, S. C., Grahame, N. J., Harris, K., & Miller, R. R. (1992). Associative structures underlying enhanced negative summation following operational extinction of a Pavlovian inhibitor. Learning and Motivation, 23, 43–62.
Hammond, L. J. (1980). The effect of contingency upon the appetitive conditioning of free-operant behavior. Journal of the Experimental Analysis of Behavior, 34, 297–304.
Hearst, E. (1988). Fundamentals of learning and conditioning. In R. C. Atkinson, R. J. Herrnstein, G. Lindzey, & R. D. Luce (Eds.), Stevens’ handbook of experimental psychology: Vol. 2. Learning and cognition (pp. 3–109). New York: Wiley.
Herrnstein, R. J. (1961). Relative and absolute strength of response as a function of frequency of reinforcement. Journal of the Experimental Analysis of Behavior, 4, 267–272.
Herrnstein, R. J. (1964). Secondary reinforcement and rate of primary reinforcement. Journal of the Experimental Analysis of Behavior, 7, 27–36.
Herrnstein, R. J. (1970). On the law of effect. Journal of the Experimental Analysis of Behavior, 13, 243–266.
Herrnstein, R. J., & Heyman, G. M. (1979). Is matching compatible with reinforcement maxmization on concurrent variable interval, variable ratio? Journal of the Experimental Analysis of Behavior, 31, 209–223.
Herrnstein, R. J., & Prelec, D. (1992). A theory of addiction. In G. Loewenstein & J. Elster (Eds.), Choice over time (pp. 331– 360). New York: Russell Sage.
Herrnstein, R. J., & Vaughan, W. (1980). Melioration and behavior allocation. In J. E. R. Staddon (Ed.), Limits to action (pp. 143– 176). New York: Academic Press.
Heth, C. D. (1976). Simultaneous and backward fear conditioning as a function of number of CS-UCS pairings. Journal of Experimental Psychology: Animal Behavior Processes, 2, 117–129.
Heyman, G. M. (1996). Resolving the contradictions of addiction. Behavioral and Brain Sciences, 19, 561–610.
Heyman, G. M., & Luce, R. D. (1979). Operant matching is not a logical consequence of maximizing reinforcement rate. Animal Learning & Behavior, 7, 133–140.
Heyman, G. M., & Tanz, L. (1995). How to teach a pigeon to maximize overall reinforcement rate. Journal of the Experimental Analysis of Behavior, 64, 277–297.
Hinson, J. M., & Staddon, J. E. R. (1983). Hill-climbing by pigeons. Journal of the Experimental Analysis of Behavior, 39, 25–47.
Holland, P. C. (1977). Conditioned stimulus as a determinant of the form of the Pavlovian conditioned response. Journal of Experimental Psychology: Animal Behavior Processes, 3, 77–104.
Holland, P. C. (1981). Acquisition of representation-mediated conditioned food aversions. Learning and Motivation, 12, 1–18.
Holland, P. C. (1983a). Occasion setting in Pavlovian feature positive discriminations. In M. L. Commons, R. J. Herrnstein, & A. R. Wagner (Eds.), Quantitative analyses of behavior: Discrimination processes (pp. 183–206). Cambridge, MA: Ballinger.
Holland, P. C. (1983b). Representation-mediated overshadowing and potentiation of conditioned aversions. Journal of Experimental Psychology: Animal Behavior Processes, 9, 1–13.
Holland, P. C. (1989). Feature extinction enhances transfer of occasion setting. Animal Learning & Behavior, 17, 269–279.
Holland, P. C., & Forbes, D. T. (1982). Representation-mediated extinction of conditioned flavor aversions. Learning and Motivation, 13, 454–471.
Holland, P. C., & Rescorla, R. A. (1975). The effect of two ways of devaluing the unconditioned stimulus after first- and secondorder appetitive conditioning. Journal of Experimental Psychology: Animal Behavior Processes, 1, 355–363.
Hull, C. L. (1943). Principles of behavior: An introduction to behavior theory. New York: Appleton-Century.
Hull, C. L. (1952). A behavior system: An introduction to behavior theory concerning the individual organism. New Haven, CT: Yale University Press.
Hursh, S. R., & Natelson, B. H. (1981). Electrical brain stimulation and food reinforcement dissociated by demand elasticity. Physiology & Behavior, 26, 509–515.
Jenkins, H. M., & Moore, B. R. (1973). The form of the autoshaped response with food or water reinforcers. Journal of the Experimental Analysis of Behavior, 20, 163–181.
Jones, J. R., & Moore, J. (1999). Some effects of intertrial-duration on discrete-trial choice. Journal of the Experimental Analysis of Behavior, 71, 375–393.
Kacelnik, A., & Bateson, M. (1996). Risky theories—the effects of variance on foraging decisions. American Zoologist, 36, 402–434.
Kamil, A. C. (1983). Optimal foraging theory and the psychology of learning. American Zoologist, 23, 291–302.
Kamil,A. C., & Clements, K. C. (1990). Learning, memory, and foraging behavior. In D.A. Dewsbury (Ed.), Contemporary issues in comparative psychology (pp. 7–30). Sunderland, MA: Sinauer.
Kamin, L. J. (1965). Temporal and intensity characteristics of the conditioned stimulus. In W. F. Prokasy (Ed.), Classical conditioning (pp. 118–147). New York: Appleton-Century-Crofts.
Kamin, L. J. (1968). “Attention-like” processes in classical conditioning. In M. R. Jones (Ed.), Miami Symposium on the Prediction of Behavior: Aversive stimulation (pp. 9–33). Miami, FL: University of Miami Press.
Kasprow, W. J., Cacheiro, H., Balaz, M. A., & Miller, R. R. (1982). Reminder-induced recovery of associations to an overshadowed stimulus. Learning and Motivation, 13, 155–166.
Kaufman, M. A., & Bolles, R. C. (1981). A nonassociative aspect of overshadowing. Bulletin of the Psychonomic Society, 18, 318–320.
Kehoe, E. J., & Gormezano, I. (1980). Configuration and combination laws in conditioning with compound stimuli. Psychological Bulletin, 87, 351–378.
Kehoe, E. J., Horne, A. J., Horne, P. S., & Macrae, M. (1994). Summation and configuration between and within sensory modalities in classical conditioning of the rabbit. Animal Learning & Behavior, 22, 19–26.
Kelley, H. H. (1967). Attribution theory in social psychology. In D. Levine (Ed.), Nebraska Symposium on Motivation (Vol. 15, pp. 192–240). Lincoln: University of Nebraska Press.
Killeen, P. (1968). On the measurement of reinforcement frequency in the study of preference. Journal of the Experimental Analysis of Behavior, 11, 263–269.
Killeen, P. (1994). Mathematical principles of reinforcement.Behavioral and Brain Sciences, 17, 105–172 (includes commentary).
Kimble, G. A. (1961). Hilgard and Marquis’“Condition and Learning.” New York: Appleton-Century-Crofts.
Kirby, K. N. (1997). Bidding on the future: Evidence against normative discounting of delayed rewards. Journal of Experimental Psychology: General, 126, 54–70.
Kirby, K. N., & Herrnstein, R. J. (1995). Preference reversals due to myopic discounting of delayed reward. Psychological Science, 6, 83–89.
Köhler, W. (1947). Gestalt psychology: An introduction to new conceptsinmodernpsychology.NewYork:LiverightPublicationCo.
Kollins, S. H., Newland, M. C., & Critchfield, T. S. (1997). Human sensitivity to reinforcement in operant choice: How much do consequences matter? Psychonomic Bulletin & Review, 4, 208–220.
Konorski, J. (1948). Conditioned reflexes and neuron organization. Cambridge, UK: Cambridge University Press.
Konorski, J. (1967). Integrative activity of the brain: An interdisciplinary approach. Chicago: University of Chicago Press.
Kraemer, P. J., Lariviere, N. A., & Spear, N. E. (1988). Expression of a taste aversion conditioned with an odor-taste compound: Overshadowing is relatively weak in weanlings and decreases over a retention interval in adults. Animal Learning & Behavior, 16, 164–168.
Kraemer, P. J., Randall, C. K., & Carbary, T. J. (1991). Release from latent inhibition with delayed testing. Animal Learning & Behavior, 19, 139–145.
Logue, A. W. (1979). Taste aversion and the generality of the laws of learning. Psychological Bulletin, 86, 276–296.
Logue, A. W., Rodriguez, M. L., Pena-Correal, T. E., & Mauro, C. (1984). Choice in a self-control paradigm: Quantification of experience-based differences. Journal of the Experimental Analysis of Behavior, 41, 53–67.
LoLordo, V. M., & Fairless, J. L. (1985). Pavlovian conditioned inhibition: The literature since 1969. In R. R. Miller & N. E. Spear (Eds.), Information processing in animals: Conditioned inhibition (pp. 1–49). Hillsdale, NJ: Erlbaum.
Lubow, R. E. (1989). Latent inhibition and conditioned attention theory. Cambridge, UK: Cambridge University Press.
Lubow, R. E., & Moore, A. U. (1959). Latent inhibition: The effect of nonreinforced preexposure to the conditioned stimulus. Journal of Comparative and Physiological Psychology, 52, 415–419.
Mackintosh, N. J. (1975). Atheory of attention: Variations in the associability of stimuli with reinforcement. Psychological Review, 82, 276–298.
Mackintosh, N. J. (1976). Overshadowing and stimulus intensity. Animal Learning & Behavior, 4, 186–192.
Mackintosh, N. J., & Reese, B. (1979). One-trial overshadowing. Quarterly Journal of Experimental Psychology, 31, 519–526.
Maier, S. F., & Seligman, M. E. P. (1976). Learned helplessness: Theory and evidence. Journal of Experimental Psychology: General, 105, 3–46.
Maldonado, A., Cátena, A., Cándido, A., & García, I. (1999). The belief revision model: Asymmetrical effects of noncontingency on human covariation learning. Animal Learning & Behavior, 27, 168–180.
Mark, T. A., & Gallistel, C. R. (1994). Kinetics of matching. Journal of Experimental Psychology: Animal Behavior Processes, 20, 79–95.
Marlin, N. A., & Miller, R. R. (1981). Associations to contextual stimuli as a determinant of long-term habituation. Journal of Experimental Psychology: Animal Behavior Processes, 7, 313–333.
Matute, H., & Pineño, O. (1998). Stimulus competition in the absence of compound conditioning. Animal Learning & Behavior, 26, 3–14.
Matzel, L. D., Gladstein, L., & Miller, R. R. (1988). Conditioned excitation and conditioned inhibition are not mutually exclusive. Learning and Motivation, 19, 99–121.
Matzel, L. D., Held, F. P., & Miller, R. R. (1988). Information and expression of simultaneous and backward associations: Implications for contiguity theory. Learning and Motivation, 19, 317–344.
Matzel, L. D., Schachtman, T. R., & Miller, R. R. (1985). Recovery of an overshadowed association achieved by extinction of the overshadowed stimulus. Learning and Motivation, 16, 398–412.
Mazur, J. E. (1984). Tests for an equivalence rule for fixed and variable reinforcer delays. Journal of Experimental Psychology: Animal Behavior Processes, 10, 426–436.
Mazur, J. E. (2001). Hyperbolic value addition and general models of animal choice. Psychological Review, 108, 96–112.
McClelland, J. L. (1988). Connectionist models and psychological evidence. Journal of Memory and Language, 27, 107–123.
McDowell, J. J., & Kessel, R. (1979). A multivariate rate equation for variable-interval performance. Journal of the Experimental Analysis of Behavior, 31, 267–283.
McLaren, I. P. L., & Mackintosh, N. J. (2000). An elemental model of associative learning. I. Latent inhibition and perceptual learning. Animal Learning & Behavior, 28, 211–246.
McLean, A. P. (1995). Contrast and reallocation of extraneous reinforcers as a function of component duration and baseline rate of reinforcement. Journal of the Experimental Analysis of Behavior, 63, 203–224.
McLean, A. P., & White, K. G. (1983). Temporal constraint on choice: Sensitivity and bias in multiple schedules. Journal of the Experimental Analysis of Behavior, 39, 405–426.
Meehl, P. E. (1950). On the circularity of the law of effect. Psychological Bulletin, 47, 52–75.
Miller, J. S., McKinzie, D. L., Kraebel, K. S., & Spear, N. E. (1996). Changes in the expression of stimulus selection: Blocking represents selective memory retrieval rather than selective associations. Learning and Motivation, 27, 307–316.
Miller, J. S., Scherer, S. L., & Jagielo, J. A. (1995). Enhancement of conditioning by a nongustatory CS: Ontogenetic differences in the mechanisms underlying potentiation. Learning and Motivation, 26, 43–62.
Miller, R. R., & Balaz, M. A. (1981). Differences in adaptiveness between classically conditioned responses and instrumentally acquired responses. In N. E. Spear & R. R. Miller (Eds.), Information processing in animals: Memory mechanisms (pp. 49–80). Hillsdale, NJ: Erlbaum.
Miller, R. R., Barnet, R. C., & Grahame, N. J. (1995). Assessment of the Rescorla-Wagner model. Psychological Bulletin, 117, 363–386.
Miller, R. R., & Escobar, M. (2001). Contrasting acquisitionfocused and performance-focused models of behavior change. Current Directions in Psychological Science, 10, 141–145.
Miller, R. R., & Matute, H. (1996). Biological significance in forward and backward blocking: Resolution of a discrepancy between animal conditioning and human causal judgment. Journal of Experimental Psychology: General, 125, 370–386.
Miller, R. R., & Matzel, L. D. (1988). The comparator hypothesis: A response rule for the expression of associations. In G. H. Bower (Ed.), The psychology of learning and motivation (Vol. 22, pp. 51–92). San Diego, CA: Academic Press.
Myerson, J., & Green, L. (1995). Discounting of delayed rewards: Models of individual choice. Journal of the Experimental Analysis of Behavior, 64, 263–276.
Nairne, J. S., & Rescorla, R. A. (1981). Second-order conditioning with diffuse auditory reinforcers in the pigeon. Learning and Motivation, 12, 65–91.
Neely, J. H. (1977). Semantic priming and retrieval from lexical memory: Roles of inhibitionless spreading activation and limited-capacity attention. Journal of Experimental Psychology: General, 106, 226–254.
Nevin, J. A. (1969). Interval reinforcement of behavior in discrete trials. Journal of the Experimental Analysis of Behavior, 12, 875–885.
Nevin, J. A. (1974). Response strength in multiple schedules. Journal of the Experimental Analysis of Behavior, 21, 389–408.
Nevin, J. A. (1979). Reinforcement schedules and response strength. In M. D. Zeiler & P. Harzem (Eds.), Reinforcement and the organization of behaviour (pp. 117–158). Chichester, UK: Wiley.
Nevin, J. A., & Grace, R. C. (2000). Behavioral momentum and the law of effect. Behavioral and Brain Sciences, 23, 73–130 (includes commentary).
Nevin, J. A., Mandell, C., & Atak, J. R. (1983). The analysis of behavioral momentum. Journal of the Experimental Analysis of Behavior, 39, 49–59.
Osgood, C. E. (1949). The similarity paradox in human learning: A resolution. Psychological Review, 56, 132–143.
Pavlov, I. P. (1927). Conditioned reflexes. London: Oxford University Press.
Pearce, J. M. (1987). Amodel for stimulus generalization in Pavlovian conditioning. Psychological Review, 94, 61–73.
Pearce, J. M., & Hall, G. (1978). Overshadowing the instrumental conditioning of a lever press response by a more valid predictor of reinforcement. Journal of Experimental Psychology: Animal Behavior Processes, 4, 356–367.
Pearce, J. M., & Hall, G. (1980). A model for Pavlovian learning: Variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychological Review, 87, 532–552.
Postman, L. (1947). The history and present status of the law of effect. Psychological Bulletin, 44, 489–563.
Postman, L., Stark, K., & Fraser, J. (1968). Temporal changes in interference. Journal of Verbal Learning and Verbal Behavior, 7, 672–694.
Premack, D. (1965). Reinforcement theory. In D. Levine (Ed.), Nebraska Symposium on Motivation (Vol. 18, pp. 123–180). Lincoln: University of Nebraska Press.
Rachlin, H. (1995). Self-control: Beyond commitment. Behavioral and Brain Sciences, 18, 101–159 (includes commentary).
Rachlin, H. (1997). Four teleological theories of addiction. Psychonomic Bulletin & Review, 4, 462–473.
Rachlin, H., Battalio, R., Kagel, J., & Green, L. (1981). Maximization theory in behavioral psychology. Behavioral and Brain Sciences, 4, 371–417 (includes commentary).
Rachlin, H., & Burkhard, B. (1978). The temporal triangle: Response substitution in instrumental conditioning. Psychological Review, 85, 22–47.
Rachlin, H., & Green, L. (1972). Commitment, choice and selfcontrol. Journal of the Experimental Analysis of Behavior, 17, 15–22.
Rachlin, H., Green, L., Kagel, J. H., & Battalio, R. C. (1976). Economic demand theory and psychological studies of choice. In G. Bower (Ed.), The psychology of learning and motivation (Vol. 10, pp. 129–154). New York: Academic Press.
Rachlin, H., Green, L., & Tormey, B. (1988). Is there a decisive test between matching and maximizing? Journal of the Experimental Analysis of Behavior, 50, 113–123.
Randich, A., & LoLordo, V. M. (1979). Associative and nonassociative theories of the UCS preexposure phenomenon. Psychological Bulletin, 86, 523–548.
Rashotte, M. E., Marshall, B. S., & O’Connell, J. M. (1981). Signaling functions of the second-order CS: Partial reinforcement during second-order conditioning of the pigeon’s keypeck. Animal Learning & Behavior, 9, 253–260.
Rescorla, R. A. (1968). Probability of shock in the presence and absence of CS in fear conditioning. Journal of Comparative and Physiological Psychology, 66, 1–5.
Rescorla, R. A. (1969). Pavlovian conditioned inhibition. Psychological Bulletin, 72, 77–94.
Rescorla, R. A. (1991). Associative relations in instrumental learning: The Eighteenth Bartlett Memorial Lecture. Quarterly Journal of Experimental Psychology, 43B, 1–23.
Rescorla, R. A., & Cunningham, C. L. (1979). Spatial contiguity facilitates Pavlovian second-order conditioning. Journal of Experimental Psychology: Animal Behavior Processes, 5, 152– 161.
Rescorla, R. A., & Furrow, D. R. (1977). Stimulus similarity as a determinant of Pavlovian conditioning. Journal of Experimental Psychology: Animal Behavior Processes, 3, 203–215.
Rescorla, R. A., & Holland, P. C. (1977). Associations in Pavlovian conditioned inhibition. Learning and Motivation, 8, 429–447.
Rescorla, R. A., & Wagner, A. R. (1972). Atheory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and non-reinforcement. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning: Vol. 2. Current theory and research (pp. 64–99). New York: Appleton-Century-Crofts.
Riccio, D. C., Richardson, R., & Ebner, D. L. (1984). Memory retrieval deficits based upon altered contextual cues: A paradox. Psychological Bulletin, 96, 152–165.
Rizley, R. C., & Rescorla, R. A. (1972). Associations in secondorder conditioning and sensory preconditioning. Journal of Comparative and Physiological Psychology, 81, 1–11.
Roberts, S. (1981). Isolation of an internal clock. Journal of Experimental Psychology: Animal Behavior Processes, 7, 242–268.
Romaniuk, C. B., & Williams, D. A. (2000). Conditioning across the duration of a backward conditioned stimulus. Journal of Experimental Psychology: Animal Behavior Processes, 26, 454–461.
Ross, B. H., & Makin, V. S. (1999). Prototype versus exemplar models in cognition. In R. J. Sternberg (Ed.), The nature of cognition (pp. 206–241). Cambridge, MA: MIT press.
Claire-Smith, R. (1979). The overshadowing and blocking of punishment. Quarterly Journal of Experimental Psychology, 31, 51–61.
Saint Paul, U. v. (1982). Do geese use path integration for walking home? In F. Papi & H. G. Wallraff (Eds.), Avian navigation (pp. 298–307). New York: Springer.
Savastano, H. I., & Fantino, E. (1994). Human choice in concurrent ratio-interval schedules of reinforcement. Journal of the Experimental Analysis of Behavior, 61, 453–463.
Savastano, I., & Miller, R. R. (1998). Time as content in Pavlovian conditioning. Behavioural Processes, 44, 147–162.
Schaal, D. W., & Branch, M. N. (1988). Responding of pigeons under variable-interval schedules of unsignaled, briefly signaled, and completely signaled delays to reinforcement. Journal of the Experimental Analysis of Behavior, 50, 33–54.
Schmajuk, N. A., Lamoureux, J. A., & Holland, P. C. (1998). Occasion setting: A neural network approach. Psychological Review, 105, 3–32.
Schneider, B. (1969). Atwo-state analysis of fixed-interval responding in the pigeon. Journal of the Experimental Analysis of Behavior, 12, 667–687.
Shanks, D. R. (1994). Human associative learning. In N. J. Mackintosh (Ed.), Animal learning and cognition (pp. 335–374). San Diego, CA: Academic Press.
Shettleworth, S. J. (1998). Cognition, evolution, and behavior. New York: Oxford.
Shimp, C. P. (1966). Probabilistically reinforced choice behavior in pigeons. Journal of the Experimental Analysis of Behavior, 9, 433–455.
Shimp, C. P. (1969). Optimum behavior in free-operant experiments. Psychological Review, 76, 97–112.
Sidman, M. (1960). Tactics of scientific research. New York: Basic Books.
Siegel, S. (1989). Pharmacological conditioning and drug effects. In A. J. Goudie & M. W. Emmet-Oglesby (Eds.), Psychoactive drugs: Tolerance and sensitization (pp. 115–185). Clifton, NJ: Humana Press.
Skinner, B. F. (1938). The behavior of organisms. New York: Appleton-Century-Crofts.
Skinner, B. F. (1969). Contingencies of reinforcement: A theoretical analysis. New York: Appleton-Century-Crofts.
Slamecka, N. J., & Ceraso, J. (1960). Retroactive and proactive inhibition of verbal learning. Psychological Bulletin, 57, 449–475.
Spence, K. W. (1936). The nature of discrimination learning in animals. Psychological Review, 43, 427–449.
Staddon, J. E. R. (1977). On Herrnstein’s equation and related forms. Journal of the Experimental Analysis of Behavior, 28, 163–170.
Staddon, J. E. R. (1979). Operant behavior as adaptation to constraint. Journal of Experimental Psychology: General, 108, 48–67.
Staddon, J. E. R., & Simmelhag, V. L. (1970). The “superstition” experiment: A reexamination of its implications for the principles of adaptive behavior. Psychological Review, 78, 3–43.
Sulzer-Azaroff, B., & Mayer, R. G. (1991). Behavior analysis for lasting change. Worth, TX: Holt, Rinehart, & Winston.
Sutton, R. S., & Barto, A. G. (1981). Toward a modern theory of adaptive networks: Expectation and prediction. Psychological Review, 88, 135–170.
Thompson, R. F., & Spencer, W. A. (1966). Habituation: A model phenomenon for the study of neuronal substrates of behavior. Psychological Review, 73, 16–43.
Thorndike, E. L. (1898). Animal intelligence: An experimental study of the associative processes in animals. Psychological Monographs, 2, 8.
Thorndike, E. L. (1911). Animal intelligence: Experimental studies. New York: Macmillan.
Thorndike, E. L. (1932). Fundamentals of learning. New York: Columbia University.
Thorndike, E. L. (1949). Selected writings from a connectionist’s psychology. East Norwalk, CT: Appleton-Century-Crofts.
Timberlake, W. (1984). Behavior regulation and learned performance: Some misapprehensions and disagreements. Journal of the Experimental Analysis of Behavior, 41, 355–375.
Timberlake, W., & Allison, J. (1974). Response deprivation: An empirical approach to instrumental performance. Psychological Review, 81, 146–164.
Timberlake, W., & Lucas, G. A. (1989). Behavior systems and learning: From misbehavior to general principles. In S. B. Klein & R. R. Mowrer (Eds.), Contemporary learning theories: Instrumental conditioning theory and the impact of biological constraints in learning (pp. 237–275). Hillsdale, NJ: Erlbaum.
Timberlake, W., & Lucas, G. A. (1991). Periodic water, interwater interval, and adjunctive behavior in a 24-hour multiresponse environment. Animal Learning & Behavior, 19, 369–380.
Tolman, E. C. (1932). Purposive behavior in animals and men. London: Century/Random House.
Tolman, E. C., & Honzik, C. H. (1930). Introduction and removal of reward, and maze performance in rats. University of California Publications in Psychology, 4, 257–275.
Van Hamme, L. J., & Wasserman, E. A. (1994). Cue competition in causality judgments: The role of nonpresentation of compound stimulus elements. Learning and Motivation, 25, 127–151.
Vaughan, W. (1981). Melioration, matching, and maximization. Journal of the Experimental Analysis of Behavior, 36, 141–149.
Wagner, R. (1978). Expectancies and the priming of STM. In S. H. Hulse, H. Fowler, & W. K. Honig (Eds.), Cognitive processes in animal behavior (pp. 177–209). Hillsdale, NJ: Erlbaum.
Wagner, A. R. (1981). SOP: Amodel of automatic memory processing in animal behavior. In N. E. Spear & R. R. Miller (Eds.), Information processing in animals: Memory mechanisms (pp. 5–47). Hillsdale, NJ: Erlbaum.
Wasserman, E. A., Elek, S. M., Chatlosh, D. L., & Baker, A. G. (1993). Rating causal relations: Role of probability in judgments of response-outcome contingency. Journal of Experimental Psychology: Learning, Memory, and Cognition, 19, 174–188.
Watson, J. B. (1913). Psychology as a behaviorist views it. Psychological Review, 20, 158–177.
Weiss, S. J., & Schindler, C. W. (1981). Generalization peak shift in rats under conditions of positive reinforcement and avoidance. Journal of the Experimental Analysis of Behavior, 35, 175–185.
Williams, B. A. (1976). The effects of unsignalled delayed reinforcement. Journal of the Experimental Analysis of Behavior, 26, 441–449.
Williams, B. A. (1982). Blocking the response-reinforcer association. In M. L. Commons, R. J. Herrnstein, & A. R. Wagner (Eds.), Quantitative analyses of behavior: Vol. 3. Acquisition (pp. 427–447). Cambridge, MA: Ballinger.
Williams, B. A. (1983). Another look at contrast in multiple schedules. Journal of the Experimental Analysis of Behavior, 39, 345–384.
Williams, B. A. (1988). Reinforcement, choice, and response strength. In R. C. Atkinson, R. J. Herrnstein, G. Lindzey, & R. D. Luce (Eds.), Stevens’ handbook of experimental psychology: Vol. 2. Learning and cognition (2nd ed., pp. 167–244). New York: Wiley.
Williams, B. A. (1992). Dissociation of theories of choice via temporal spacing of choice opportunities. Journal of Experimental Psychology: Animal Behavior Processes, 18, 287–297.
Williams, B. A. (1994a). Reinforcement and choice. In N. J. Mackintosh (Ed.), Animal learning and cognition (pp. 81–108). San Diego, CA: Academic Press.
Williams, B. A. (1994b). Conditioned reinforcement: Neglected or outmoded explanatory construct? Psychonomic Bulletin & Review, 1, 457–475.
Williams, B. A., & Royalty, P. (1989). A test of the melioration theory of matching. Journal of Experimental Psychology: Animal Behavior Processes, 15, 99–113.
Williams, B. A., & Wixted, J. T. (1986). An equation for behavioral contrast. Journal of the Experimental Analysis of Behavior, 45, 47–62.
Williams, D. R., & Williams, H. (1969). Auto-maintenance in the pigeon: Sustained pecking despite contingent non-reinforcement. Journal of the Experimental Analysis of Behavior, 12, 511–520.
Yin, H., Barnet, R. C., & Miller, R. R. (1994). Second-order conditioning and Pavlovian conditioned inhibition: Operational similarities and differences. Journal of Experimental Psychology: Animal Behavior Processes, 20, 419–428.
Yin, H., Grahame, N. J., & Miller, R. R. (1993). Extinction of comparator stimuli during and after acquisition: Differential facilitative effects on Pavlovian responding. Learning and Motivation, 24, 219–241.
Zeiler, M. D. (1984). The sleeping giant: Reinforcement schedules. Journal of the Experimental Analysis of Behavior, 42, 485–493.