Sample Motion Perception Research Paper. Browse other research paper examples and check the list of research paper topics for more inspiration. If you need a religion research paper written according to all the academic standards, you can always turn to our experienced writers for help. This is how your paper can get an A! Feel free to contact our research paper writing service for professional assistance. We offer high-quality assignments for reasonable rates.
Visual motion perception involves much more than just seeing movement. Though people are not generally aware of it, a great deal also goes on ‘behind the scenes,’ involving a series of elaborate computational stages that not only detect image movements, but also analyze image velocity patterns to extract surprisingly complex and useful information about the outside world. For example, watching a person walk through a crowd requires calculations, probably done in early stages of cortical processing, that would cripple the fastest of modern computers. The goal of this review is to describe the crucial role motion processing plays in visual perception, to tell how our present understanding of the visual motion system came about, and to suggest exciting future directions that motion perception studies are likely to take.
Academic Writing, Editing, Proofreading, And Problem Solving Services
Get 10% OFF with 24START discount code
Intuitively, it would seem very easy to detect motion. As a car drives past us, we note that its position is not constant; therefore, it must be moving. But the apparent simplicity of this is misleading. Tracking an object involves ﬁrst recognizing it as an object, a daunting challenge even under stationary conditions. To measure motion by continuously updating a position estimate would require us to parse an image (ﬁnd the objects), recognize an object (while it is moving), make sure it is the same object as in the last sample, calculate its position, then ﬁnd the derivative with respect to time (or at least divide by the time since the last update). Even then, how would we know that snow is blowing towards the East, or that bubbles on the surface of a stream ﬂow from left to right? In such cases, there would be no objects to recognize and track.
Fortunately, there is a way of detecting motion that is as simple and robust as sensing the orientation of a line. It does not require object recognition, and it does not require us to know anything about position or time. Wertheimer was perhaps the ﬁrst to suggest this kind of fundamental motion perception: the raw sense of movement, irreducible to separate senses of object, space, and time (Wertheimer 1912). We will see below that the visual motion system does indeed work like this. First, however, it is important to understand the fundamental and perhaps unexpected role visual motion plays in our daily existence.
2. Functions Of Motion Processing
Arguably the most basic judgment one has to make about the world is what is moving and what is not. This is the most obvious role for visual motion processing. Less obvious, perhaps, is the need for accurate velocity measurements when our eyes track a moving object; it simply does not work to keep adjusting the eye position according to where the target appears on the retina. Even simple things like catching a baseball, ﬁlling a cup of coﬀee, crossing the street, or even interpreting facial expressions rely on motion processing (Zihl et al. 1983). If such functions seem specialized, consider the complexity of static vision, where we recognize colors, shapes, textures, and 3D form. It is no more remarkable that the visual system should have developed specialized processing for motion, which, through evolution, has shown itself to be rich in information.
In some types of motion processing, the quantity of interest is not really motion at all. For example, as one moves through the environment, the visual motion system analyzes the expanding image on the retina to compute the direction of self-motion, which is (or can be) a static thing. Also, motion cues give rich information about where object boundaries are, and about the relative distance of objects in a scene. Humans can even use expanding image cues to predict their precise moment of collision with an object; this remarkable computation is equivalent to taking the spatial derivative of a temporal derivative (the expansion rate) and does not require knowledge of target distance or approach speed! There are other important roles for visual motion; the point is that visual motion is a valuable source of information about the environment, and about our relationship to the environment, and our visual system has clearly evolved to exploit it.
3. Anatomy Of Motion Processing
Image contrasts are captured at high resolution in the retina and sent back, via a relay in the lateral geniculate nucleus, to the primary visual cortex (V1). In primates, V1 neurons are the ﬁrst in the visual hierarchy to show directionality: a heightened response to objects moving in a particular direction. These neurons are mainly packed into layer 4B, which makes a direct projection to the next cortical stage, the middle temporal area (MT). MT is now understood to be a specialized motion-processing area. Ascending projections out of layer 4 are atypical, usually coming instead from layers 2/3 and 6, and this suggests the importance of a fast, specialized motion pathway. MT itself projects to the medial superior temporal area (MST), another motion processing area whose function will be discussed in Sect. 6.
The motion pathway described above represents only the core of what has so far been discovered. Other areas, such as V2, V3, and VIP, clearly have roles in motion processing, but they are less understood. The goal of the present review is explain basic principles of motion perception; thus, it will suﬃce to concentrate on the basic pathway described.
4. Theory Of Motion Detection
Reichardt proposed a model of motion detection that supposes a pair of spatially oﬀset contrast detectors as the input (Reichardt 1961). When these are activated, they send their outputs to a multiplier, but not at the same speed. One of the connections to the multiplier is presumed to have some kind of delay. Therefore, if this slow line is triggered ﬁrst, and the quicker line is triggered second, the two signals may arrive simultaneously at the multiplier, creating a very large, multiplicative response. There is no other way to get this strong response; one simply has to trigger the detectors with the appropriate timing. Therefore, the Reichardt model is a velocity detector. (Note: we use the term velocity in the vector sense here, indicating both direction and speed).
Later theoretical work by several groups led to a diﬀerent conceptualization, which we will call the spatiotemporal energy model (Adelson and Bergen 1985). It is based on the idea of a linear space-time ﬁlter. In the same way that a neuron’s spatial receptive ﬁeld can be thought of as a weighting function (a region of varying sensitivity) over space, we can deﬁne a space-time ﬁeld as a weighting function over space and time. Referring to Fig. 1, the neuron that owns this space-time receptive ﬁeld will be excited when the white parts of the ﬁeld are contacted and suppressed when the dark parts are contacted. If we place a bar, representing stimulus contrast, with the appropriate angle, it can be made to cover the white parts only; i.e., it is a purely excitatory stimulus (Fig. 1A). Now consider that any oblique orientation in space-time represents motion: a ﬁnite change in position over a ﬁnite change in time. Thus, the obliquely oriented lobes of the space-time receptive ﬁelds can be thought of as velocity detectors.
It is now generally believed that in primates, a subset of V1 neurons act as velocity detectors. Various experiments have demonstrated some degree of linearity in these detectors (Reid et al. 1987), which argues against the strict Reichardt model since its directional output occurs at a nonlinear (multiplicative) stage. But the Reichardt model and the spatiotemporal energy model are not really so diﬀerent; in fact with minor modiﬁcations to the Reichardt model, the models produce equivalent output (Adelson and Bergen 1985, van Santen and Sperling 1985). The diﬀerences are in the order of computational steps.
Note that even the spatiotemporal energy model is only linear in its earliest stages; at some point a nonlinearity must occur. A sine-wave grating moving in the optimal (preferred) direction across a linear space-time ﬁlter generates large outputs (Fig. 1B), but these oscillate around zero as the positive and negative phases of the wave move through the ﬁlter. The total output—the integrated area—is zero. Some kind of nonlinear operation is needed to prevent this; squaring, for example, works well (thus the ‘energy’ nomenclature). Physiologically, the nonlinearity is probably a combination of response rectiﬁcation, threshold and saturation, gain normalization, and perhaps other factors.
5. Integration Of Local Velocities
One of the most important revelations in the study of motion perception has been that motion processing goes well beyond simple detection (though detection is clearly not simple at all). Theorists have known this for many years; anyone trying to create a computerized motion processor realizes that local motion cues have to be integrated to reduce noise and to solve the aperture problem (see below), and this integration creates new problems because separate objects risk being mixed together. Independent visual cues such as disparity (Bradley et al. 1995) or color (Croner and Albright 1999) may be used to sort motion signals from diﬀerent objects, but then there is the issue of how these cues are incorporated; and so on. Thus, before discussing the fancier aspects of motion processing, such as kinetic depth and heading computation, we must ﬁrst examine the integration steps.
Whether through direct projection or via V2 (or other areas), much of the output of V1 directional cells ends up in area MT. Whereas about one quarter of the neurons in V1 are direction-selective, the great majority of cells in MT are direction-selective; thus, MT as an area is clearly preoccupied with visual motion. Still, V1 is much bigger—about 20 times bigger—so there are more directional neurons overall in V1. What, then, is the function of MT?
It is, at least in part, to combine local velocity samples intelligently. V1 neurons have tiny receptive ﬁelds, so when a moving edge appears, it always looks like it is moving perpendicularly to the edge (Fig. 2), even when it is not. This dilemma, called the aperture problem, could be resolved with large receptive ﬁelds, but then motion detection could only be done at coarse resolution. The solution is to combine the outputs of many directional V1 cells. Individually their outputs are meaningless; there is no probabilistic relationship between edge orientation and object direction (although, in the absence of noise, a single direction sample does narrow the range of possible object directions to 180 ). Collectively, however, the local velocity samples (the V1 outputs) can give accurate information about the overall direction of movement.
The manner in which MT neurons integrate input has been the subject of some debate. One possibility is simply to average the local velocities; in many cases, at least, this gives a good approximation of the object’s direction. But to the extent that visible edge orientations are not evenly distributed about the object direction, the vector average tends to be wrong. Another possibility is to ﬁnd an explicit solution, which is in theory given by the direction and speed of any two vector samples. The geometric form of this calculation is called the intersection of constraints (IOC). Or, one might approximate the IOC by emphasizing higher speeds; since the apparent speed of local velocity samples falls oﬀ to the extent that their apparent direction diﬀers from the object direction, this would have the eﬀect of favoring motion signals that really indicate the object’s direction. Finally, one could isolate features, such as line intersections or texture changes, and track them. Features always appear to move in the object direction so they are a trustworthy source of motion information. Note that the vector average diﬀers from the other computations in that it is linear.
Movshon and colleagues constructed a stimulus that can logically be seen as either a single, moving plaid, or as two sine gratings sliding over each other. They discovered that some MT neurons were capable of ﬁnding the overall plaid direction, which requires integrating information from the individual gratings. They called these ‘pattern cells’ (Movshon et al. 1985). From those studies, we cannot tell how the integration is done, only that it is done. But recently Pack and Born created stimuli that appear to be moving in the wrong direction if one simply takes their vector average. Responding to these stimuli, MT neurons computed the correct direction, or at least came close to it, which indicates one of the three nonlinear computations mentioned above. Remarkably, this semi-exact solution required some 140 msec to compute, and in the meantime, the neurons generated a vector average result. Thus, MT neurons compute a fast, approximate solution followed by a slower, more accurate solution.
Any operation by MT that combines multiple inputs risks crossing object boundaries. Since it is generally useless to know the ‘overall’ direction of independent objects, it is critical that MT conﬁne its integrative operations to a particular, coherently moving object. Several lines of evidence suggest that MT neurons constrain their integration in this way. First, MT neurons have antagonistic regions surrounding their (classical) receptive ﬁelds that tend to suppress responsiveness when stimuli are uniform in their direction, speed, and/or disparity (depth) (Allman et al. 1985, Bradley and Andersen 1998). That is, neurons tend to be more active when their receptive ﬁelds detect discontinuities, and thus probable object boundaries. Second, the mutual cancellation of preferred-and non-preferred direction signals in MT (suggesting that they are being integrated) decreases or disappears when the movements occur at diﬀerent depths (Bradley et al. 1995). This makes sense in that cues from diﬀerent depths are probably from diﬀerent objects. Finally, the ability of MT neurons to extract coherent motion signal from noise is facilitated by color cues that separate the noise from the genuine signal (Croner and Albright 1999).
6. Higher Motion Processing
One of the more useful quantities one can compute from visual motion is depth, or relative distance from the observer. This is called kinetic depth, or structurefrom-motion. Overall, it is based on lawful relationships between retinal motion and 3D image structure, presumably discovered by the brain sometime during its evolution. These relationships, for now, are understood to be of three basic types, though their neural mechanisms may not be distinct. In one type, the presence of diﬀerent directions is suﬃcient to evoke the perception of a depth slice between them, particularly if the signals are spatially overlapped (like screen doors sliding over each other). This eﬀect probably occurs in MT, possibly through competitive circuitry between neurons tuned for diﬀerent directions (Bradley et al. 1998). The depth-slice eﬀect is logical when one considers that if two objects move through the same part of a 2D image, one is probably in front of the other; otherwise they would collide.
Another form of kinetic depth involves large speed gradients. Looking out the window of a moving car, for example, invokes a strong sense of near and far, and this is largely due to the speed diﬀerence between the immediate foreground, which rushes by, and the distant background, which appears to move slowly. Orban and colleagues found MT neurons that respond selectively to this type of stimulus (Xiao et al. 1997). Interestingly, the neurons speciﬁcally sense the speed gradient that spans their receptive ﬁeld center and surround, an important result because it is among the most sophisticated functions yet found for a surround. Finally, localized speed gradients give a sense of curvature; this fact is exploited in CAD programs that rotate an object on the screen to give a better sense of its 3D structure (Wallach and O’Connell 1953). It is not known where these localized speed gradients are computed.
Another remarkable output of the visual motion system is the direction of self-motion, or heading. Gibson realized that when an observer moves through the environment, the retinal image expands, and the focus (origin) of this expansion corresponds to the direction of heading (1950). His suggestion that humans sense heading in this way was not initially accepted, however, because of the problem that when the eyes move—as in tracking any object not on the motion path—the expansion focus shifts in the direction of eye movement, and thus no longer corresponds to the heading. But Warren, Banks, and their colleagues later showed that subjects do in fact rely on the expansion focus to know their direction of heading, even during eye movements. Remarkably, they use an internal signal linked to the eye movement to approximate where the focus would have been without the eye movement, thus recovering the heading.
Following work by Saito et al, Duﬀy and Wurtz, and others, it became clear that heading computation might be carried out in the dorsal part of MST, called MSTd (Duﬀy and Wurtz 1991, Saito et al. 1986). The primary reasons were that: (a) MSTd neurons have large receptive ﬁelds, often covering most of the visual image; (b) many MSTd neurons are particularly responsive to expanding images; and (c) expansion selective MSTd neurons are sensitive to the position of an expansion focus. The role of MSTd in heading perception was conﬁrmed when MSTd neurons were shown to adjust their focus tuning in a way that compensates for eye movements (Bradley et al. 1996), paralleling the psychophysical results, and when it was discovered that micro stimulation of MSTd neurons produces biases in heading percepts (Britten and Wezel 1998). The combination of theoretical, psycho- physical, and neurophysiological eﬀorts that led to our current understanding of heading perception is a good example of how neuroscience beneﬁts from the convergence of ideas.
Other kinds of information are extracted from visual motion cues in the primate brain. For example, motion cues suggest object boundaries; they provide an estimate of time-to-collision; and they provide retinal slip information needed to track moving objects. Unfortunately, these cannot all be discussed thoroughly here.
7. Neural Correlates Of Motion Perception
Studies by Newsome and colleagues have clearly demonstrated the role of MT and MST in motion perception (Celebrini and Newsome 1994, Newsome et al. 1989). These researchers varied the signal-to-noise ratio of an image moving right or left and required monkeys to report the overall direction. They found a clear correlation between the direction reported by the monkey and the activity of direction-selective cells in MT and MST. But for some stimuli, the S/N ratio was actually zero; i.e., they contained no information about direction. The monkeys were required to answer anyway. Remarkably, in such cases, neural activity was still correlated with perception.
One possibility is that stochastic changes in MT/MST activity resulted in slight directional imbalances; for example, on a given trial, right-tuned cells might have been accidentally slightly more active than left-tuned cells, leading to a net sense of rightward motion. The other possibility is that monkeys, through cognitive mechanisms we can only speculate about, imagined right or left motion, and this in turn inﬂuenced MT/MST activities. In either case, the inseparability of direction judgment and MT/MST activity leaves little doubt about the crucial role of these areas in motion perception.
Reinforcing these ﬁndings, microstimulation studies have shown that direction judgments by monkeys can be reproducibly and predictably biased by stimulating MT or MST neurons with a particular direction preference (for example, stimulating cells selective for rightward motion tends to make monkeys see rightward motion) (Celebrini and Newsome 1995, Salzman et al. 1992). And lesion studies suggest that MT, at least, is crucial for normal motion perception. Monkeys with MT ablation, for example, have diﬃculty tracking moving targets Dursteler et al. 1987), and motion perception is profoundly disturbed in humans with bilateral damage to the MT/MST region Zihl et al. 1983).
None of the above proves that motion perception comes from MT or MST. In fact, there is no reason to assume that perception occurs in any speciﬁc place. But the evidence presented above does suggest that in the processing that occurs between retinal motion and behavioral output, MT and MST have crucial roles.
8. Future Questions
Research on visual motion perception has advanced rapidly in the last few decades, from basic detection studies in the 1960s (Hubel and Wiesel 1968) to recent results concerning heading perception (Bradley et al. 1996), decision centers (Gold and Shadlen 2000), solutions to the aperture problem (Pack and Born 2001), and attentional gating of direction signals (Treue et al. 1999). As exciting as these developments have been, it is important not to leave the most basic questions behind, as many of these have yet to be answered. For instance, what kind of neural circuitry accounts for MT and MST response properties? How is speed measured? And what are the roles of directional neurons in areas outside MT and MST? Some of the most important results will probably derive from such basic studies.
It also seems that exciting things are about to happen in the study of population coding. MT and MST clearly play important roles in motion processing, which is to say that they compute something useful. This implies that the output of these areas— whatever it is they derive from their input—takes some recognizable form. To suggest otherwise would be to imply that other parts of the brain that use this information, such as the smooth-pursuit system and the decision centers that lead to direction judgments, understand a magical code that is not knowable to us. Rejecting this, the logical question is: what is the code? That is, in what format is the result of the neural computations expressed?
In the case of unidirectional percepts, that information looks something like a bell-shaped curve that expresses neurons’ responses as a function of their preferred direction (Fig. 3). The idea is simply that neurons whose preferred direction matches the stimulus direction should have the strongest response. Whether this is interpreted internally in terms of the distribution mean or peak remains to be determined. But more basic questions apply to multidirectional percepts. When two movements occur near each other in an image, how does the brain know this? Because of noise and the aperture problem, the presence of diﬀerent direction signals in the same part of an image never comes as a surprise and certainly is not a reliable indicator that independent movements have occurred.
Could we peer into MT, MST, or other areas and know whether an animal perceives one direction or two? Currently, no, because we do not know what to look for; we do not know the code. But recent studies suggest that we are on the way. Treue et al. have evidence that the actual shape of the MT response distribution may be decomposed by the visual system in deciding how many moving objects are present (2000). Pursuing a diﬀerent hypothesis, Castelo-Branco et al. discovered synchronized oscillations in MT cortex of anesthetized cats in response to moving patterns that normally look coherent, but not with stimuli that look like two separate movements (2000). Recent preliminary results from awake monkeys, in contrast, showed no such oscillations (Thiele 2000). As channel capacity for multi-neuron recording continues to increase, the future holds great promise for unraveling some of the basic principles of neural coding. Much of this insight will no doubt come from studies of visual motion perception.
- Adelson E H, Bergen J R 1985 Spatiotemporal energy models for the perception of motion. Journal of the Optical Society of America A 2: 284–99
- Allman J, Miezin F, McGuinness E 1985 Direction-speciﬁc and velocity-speciﬁc responses from beyond the classical receptive ﬁeld in the middle temporal visual area (MT). Perception 14: 105–26
- Bradley D C, Andersen R A 1998 Center-surround antagonism based on disparity in primate area MT. The Journal of Neuroscience 18: 7552–65
- Bradley D C, Chang G C, Andersen R A 1998 Encoding of three-dimensional structure-from-motion by primate area MT neurons. Nature 392: 714–7
- Bradley D C, Maxwell M, Andersen R A, Banks M S, Shenoy K V 1996 Mechanisms of heading perception in primate visual cortex Science 273: 1544–7
- Bradley D C, Qian N, Andersen R A 1995 Integration of motion and stereopsis in middle temporal cortical area of macaques. Nature 373: 609–11
- Britten K H, van Wezel R J A 1998 Electrical microstimulation of cortical area MST biases heading perception in monkeys. Nature Neuroscience 1: 59–63
- Castelo-Branco M, Goebel R, Neuenschwander S, Singer W 2000 Neural synchrony correlates with surface segregation rules. Nature 405: 685–9
- Celebrini S, Newsome W T 1994 Neuronal and psychophysical sensitivity to motion signals in extra striate area MST of the macaque monkey. The Journal of Neuroscience 14: 4109–24
- Celebrini S, Newsome W T 1995 Microstimulation of extra-striate area MST inﬂuences performance on a direction discrimination task. Journal of Neurophysiology Online 73: 437–48
- Croner L J, Albright T D 1999 Segmentation by color inﬂuences responses of motion-sensitive neurons in the cortical middle temporal visual area. The Journal of Neuroscience 19: 3935–51
- Duﬀy C J, Wurtz R H 1991 Sensitivity of MST neurons to optic ﬂow stimuli. I. A continuum of response selectivity to largeﬁeld stimuli. Journal of Neurophysiology Online 65: 1329–45
- Dursteler M R, Wurtz R H, Newsome W T 1987 Directional pursuit deﬁcits following lesions of the foveal representation within the superior temporal sulcus of the macaque monkey. Journal of Neurophysiology Online 57: 1262–87
- Gibson J J 1950 The Perception of the Visual World. HoughtonMiﬄin, Boston
- Gold J I, Shadlen M N 2000 Representation of a perceptual decision in developing oculomotor commands. Nature 404: 390–4
- Hubel D H, Wiesel T N 1968 Receptive ﬁelds and functional architecture of monkey striate cortex. Journal of Physiology 195: 215–43
- Movshon J A, Adelson E H, Gizzi M, Newsome W T 1985 The analysis of moving visual patterns. In: Chagas C, Gattass R, Gross C (eds.) Study Group on Pattern Recognition Mechanisms. Pontiﬁca Academia Scientiarum, Vatican City, Rome, pp. 117–51
- Newsome W T, Britten K H, Movshon J A 1989 Neuronal correlates of a perceptual decision. Nature 341: 52–4
- Pack C C, Born R T 2001 Temporal dynamics of a neural solution to the aperture problem in visual area MT of macaque brain. Nature 409: 1040–2
- Reichardt W 1961 Autocorrelation, a principle for the evaluation of sensory information by the central nervous system. In: Rosenblith W A (ed.) Sensory Communication. Wiley, New York
- Reid R C, Soodak R E, Shapley R M 1987 Linear mechanisms of directional selectivity in simple cells of cat striate cortex. Proceedings of the National Academy of Sciences 84: 8740–4
- Saito H, Yukie M, Tanaka K, Hikosaka K, Fukada Y, Iwai E 1986 Integration of direction signals of image motion in the superior temporal sulcus of the macaque monkey. The Journal of Neuroscience 6: 145–57
- Salzman C D, Murasugi C M, Britten K H, Newsome W T 1992 Micro-stimulation in visual area MT: Eﬀects on direction discrimination performance. The Journal of Neuroscience 12: 2331–55
- Thiele A, Stoner G 2000 Neural synchrony in macaque area MT does not correlate with plaid pattern coherence. Society of Neuroscience Abstracts 30: 777–8
- Treue S, Hol K, Rauber H J 2000 Seeing multiple directions of motion-physiology and psychophysics. Nature Neuroscience 3: 270–6
- Treue S, Martinez Trujillo J C 1999 Feature-based attention inﬂuences motion processing gain in macaque visual cortex. Nature 399: 575–9
- van Santen J P, Sperling G 1985 Elaborated Reichardt detectors. Journal of the Optical Society of America A 2: 300–21
- Wallach H, O’Connell D N 1953 The kinetic depth eﬀ The Quarterly Journal of Experimental Psychology 45: 205–17
- Wertheimer M 1912 Experimentelle Studien uber das Sehen von Beuegung. Zeitschrift fur Psychologie 61: 161–265
- Xiao D K, Marcar V L, Raiguel S E, Orban G A 1997 Selectivity of macaque MT V5 neurons for surface orientation in depth speciﬁed by motion. European Journal of Neuroscience 9: 956–64
- Zihl J, von Cramon D, Mai N 1983 Selective disturbance of movement vision after bilateral brain damage. Brain 106: 313–40