View sample Low-Level Theory Of Vision Research Paper. Browse other research paper examples and check the list of research paper topics for more inspiration. If you need a religion research paper written according to all the academic standards, you can always turn to our experienced writers for help. This is how your paper can get an A! Feel free to contact our custom writing service for professional assistance. We offer high-quality assignments for reasonable rates.
The primate visual system comprises over 30 cortical areas in addition to the retina and other subcortical structures. To model the lower levels of form vision or motion perception, it therefore becomes necessary to develop neural network theories of retinal, striate, and extrastriate cortical function that are consistent with known anatomy and physiology and that are capable of predicting relevant psychophysical data. The mathematical techniques necessary for this enterprise include convolution and ‘nonlinear dynamical systems’ theory. This research paper will review current theories of retinal and striate cortical function followed by a discussion of neural models related to areas ‘V2’ and ‘V4’ in the form vision hierarchy.
2. Theories Of Retinal Function
Retinal neurons transform the optical image in order to extract biologically relevant visual information concerning light intensity changes in space (i.e. contours and edges), changes in time, and chromaticity. Furthermore, the retina must shift its operating point so as to signal useful information over more than a brightness range, a process termed light adaptation. These transformations are carried out by networks comprising at least ﬁve diﬀerent types of neurons: photoreceptors, bipolar cells, horizontal cells, amacrine cells, and ganglion cells (see Wassle and Boycott 1991). As the extraction of color information is covered elsewhere, let us focus here on light adaptation and the extraction of contours.
Light intensities in natural scenes cover more than a 106 range, yet the ganglion cells providing the retinal output to the brain can only respond over about a 300: 1 range. Thus, the retina must compute the mean level of illumination at any given time and adjust the circuitry to respond to about a 300:1 range around that operating point. Key experiments on retinal light adaptation have demonstrated that there are two major processes involved: one subtractive and one divisive (see Hood 1998). The direct path through the retina involves photoreceptors activating bipolar cells, which in turn activate ganglion cells. In addition the retina incorporates two major feedback loops: (a) from photoreceptors to horizontal cells then back to photoreceptors, and (b) from bipolars to amacrine cells back to bipolars. A recent retinal model has tentatively identiﬁed each of these loops with one of the light adaptation processes (Wilson 1997). In particular, loop (a) was conjectured to provide sub- tractive feedback, while loop (b) was suggested to be the site of divisive feedback. In a subtractive loop, the feedback signal is subtracted from the loop input, while in a divisive loop, the feedback signal divides the loop input signal thus reducing its gain. (More likely, each feedback loop incorporates both subtraction and division, suggesting a somewhat more complex formulation.) It is signiﬁcant that both horizontal cells and the amacrine cells, the key units in these two feedback loops, have a relatively broad horizontal spread across the retinal surface and thus provide feedback that is averaged over an extended area. Subtractive feedback is well understood in the context of linear systems theory, but divisive feedback moves into the realm of ‘nonlinear dynamics’. Designating the bipolar cell response as B, the amacrine feedback response as A, and the light-derived signal received from the photoreceptors as L, the simplest divisive feedback equations are of the form (Wilson 1997):
These equations may be analyzed using standard techniques (Wilson 1999) of nonlinear dynamical systems theory. In particular, one can solve for the steady state or equilibrium value at which dB/dt = 0 and dA/dt = 0. The results for very small and very large values of L are approximately:
Thus, at very low light intensity levels the bipolar response B is approximately equal to its input, so there is no attenuation of weak signals. At high light intensity levels, however, B is compressed to approximately the (N + 1) root of L. For N = 2, a physiologically plausible ﬁgure, the divisive feedback loop in Eqn. (1) will compress an L = 106 input into a B = 100 output, easily compatible with the response range of the ganglion cells. A further property of Eqn. (1) is that the temporal response becomes more transient as L increases (Wilson 1999), and this change is well documented in the literature on ﬂicker sensitivity and light adaptation (Wilson 1997).
As is well known, retinal receptive ﬁelds have an excitatory center and inhibitory surround (or vice versa) spatial proﬁle. Direct bipolar input to the ganglion cells produces the receptive ﬁeld center, while the lateral spread of both horizontal and amacrine cells creates the antagonistic surround. Thus, the neurons engaged in light adaptation via feedback also produce the spatial surround of ganglion cell receptive ﬁelds. As is evident from the discussion above, divisive feedback has almost no eﬀect at low light levels (Eqn. (2)), so one would expect receptive ﬁeld surrounds to be very weak at low luminances but much stronger at higher luminances, and this has been demonstrated experimentally. For some simulation purposes the spatial characteristics of retinal receptive ﬁelds may be approximated as a circularly symmetric diﬀerence of Gaussians:
where R is radius, and σc and σs are the space constants of the center and surround, respectively. The inhibitory gain B(L) would vary from B(0) ≈ 0 to B(L) ≈ Aσc /σs for L ≥ 0, which makes the integral of the receptive ﬁeld ∫G dR = 0. For further discussion of retinal models see Wilson (1997) and Hood (1998).
3. Theories Of Striate Cortical Function
Hubel and Wiesel (1968) categorized striate (V1) cortical cells as simple or complex. Both of these cell types show a preference for a particular orientation and also for a range of spatial frequencies. The now classical model for a simple cell receptive ﬁeld (RF) is just an oriented linear ﬁlter with parallel excitatory and inhibitory zones deﬁned by horizontal (x) and vertical ( y) coordinates:
Equation (4) describes a vertically oriented receptive ﬁeld, but other orientations (varying in about 15° increments between 0 and 180 ) can be produced by coordinate rotation. Based on physiological data, σy ≈ 3.2σc produces appropriate orientation band- widths. The activation R(x, y) of a cortical array of simple cell receptive ﬁelds is produced by convolution:
where I(x`, y`) represents the distribution of light in the image. Simulated neural responses are then produced by passing the results of Eqn. (5) through a nonlinear function exhibiting both a threshold and saturation such as the Naka–Rushton function
where M is the maximum response level and N ≈ 2–3 for cortical neurons. As S(R) = M/2 when R = β, β is referred to as the semisaturation constant. Dynamics may now be added to this formulation as follows (Wilson 1999):
where E is the excitatory ﬁring rate and τ is the time constant for the rate of exponential approach to the asymptotic ﬁring rate.
The simple cell model in Eqns. (4)–(7) can be used to generate complex networks of interconnected model neurons for simulations of cortical function (Wilson 1999). However, contemporary research indicates that orientation selectivity is more complex than this. In particular, the linear ﬁltering in Eqn. (4) suggests that orientation tuning originates entirely from a linear superposition of nonoriented lateral geniculate (LGN) ﬁlters similar to that in Eqn. (3). While this was Hubel and Wiesel’s (1968) original suggestion for orientation selectivity, the preponderance of evidence now supports additional factors in orientation tuning. First, convergence of as few as two nonoriented LGN ﬁlters conveys a modest orientation bias on cortical neurons. This very broad orientation bias is then sharpened considerably by two operations occurring in neural networks at the cortical level: contrast gain control and collinear facilitation (Sompolinski and Shapley 1997).
Contrast gain controls were ﬁrst modeled by Heeger (1991) as divisive feedback operations. He suggested that the summed activity of many diﬀerent oriented units provided a divisive feedback signal to each of the units in turn. The divisive feedback loop in Eqn. (1) can be altered to function as a cortical gain control as follows:
where θ designates the preferred orientation of each unit. Substitution of the Naka–Rushton function S from Eqn. (6) and rearrangement shows that this cortical divisive feedback circuit has the eﬀect of scaling β by (1 + A). This network has been implemented and shown to sharpen cortical orientation tuning (Wilson 1993). Due to the dynamics of the network, this orientation sharpening evolves over time as the divisive feedback takes eﬀect, and this has recently been shown to occur in most cortical neurons (Sompolinsky and Shapley 1997).
The ﬁnal contribution to cortical orientation selectivity is provided by collinear facilitation (Sompolinsky and Shapley 1997). Recent evidence indicates that neurons with the same preferred orientation are interconnected if their receptive ﬁelds are situated on a common line deﬁned by their preferred orientation. Such reciprocally linked receptive ﬁelds are typically separated by 1–2 receptive ﬁeld lengths so that their mutual facilitation will only be triggered by stimulus edges of suﬃcient length. The excitatory connections are probably mediated by NMDA receptors so that both cells in a collinear pair must be independently activated by the stimulus for facilitation to occur. Obviously, collinear facilitation will sharpen orientation tuning when measured with extended lines or gratings. The function of collinear facilitation is to enhance the salience of extended contours, thus making them prominent candidates for further processing.
4. Global Processes In Extrastriate Form Vision
The complex cells of Hubel and Wiesel can be readily modeled by spatial pooling of simple cells with the same preferred orientation. More interesting are endstopped complex cells, which are sensitive to both contour length and curvature. As illustrated in Fig. 1(a), an end-stopped complex cell can be modeled by full-wave rectiﬁcation of simple cell responses followed by second-stage pooling using a 2–3 times larger version of the oriented ﬁlter deﬁned by Eqn. (4). Fullwave rectiﬁcation is mathematically equivalent to summing responses from both on-center and oﬀ-center simple cells that have the same spatial location and preferred orientation. The ﬁrst and second-stage ﬁlters in this ﬁlter–rectify–ﬁlter sequence usually have diﬀerent preferred orientations, with the orthogonal case being illustrated in Fig. 1(a). The optimal line stimulus for this end-stopped unit will obviously be a horizontal line with length equal to the width of the second ﬁlter’s excitatory zone as shown superimposed on the diagram. In addition to lines of ﬁxed length, end-stopped cells will respond well to curves that are tangent to the preferred orientation of the ﬁrst-stage simple cells, and they respond well to the orientation of texture boundaries (Wilson 1993). Although Hubel and Wiesel (1968) reported the presence of endstopped neurons in V1, subsequent work by von der Heydt and Peterhans (1989) indicates that they predominate in the second visual area, V2.
Following V1 and V2 the cortical form vision pathway projects to V4, and ﬁnally to inferior temporal cortex (IT). Mean receptive ﬁeld dimensions increase by a factor of approximately 2.5–3 from area to area within this pathway, reaching diameters of 10°–30° in IT. It was noted above that the receptive ﬁelds of end-stopped neurons must be approximately 2–3 times larger in each dimension than the simple cell receptive ﬁelds that provide their inputs in order to function optimally. Thus, the increasing receptive ﬁeld size throughout the form vision pathway indicates that pooling operations are combining information over increasingly large areas. The nature of pooling in V4 was ﬁrst investigated by Gallant et al. (1993). They studied macaque V4 responses to concentric and radial gratings as well as to conventional sine wave gratings. One subclass of neurons responded best to concentric gratings, while a second subclass responded best to radial gratings. Although many neurons responded to conventional sine wave gratings, few were optimized for these patterns.
Humans are extremely good at detecting concentric structure in glass patterns, and this results from concentric summation of orientation information over considerable regions (Wilson et al. 1997). A glass pattern is produced by randomly positioning pairs of signal dots such that each pair is oriented tangentially to the desired concentric structure. This produces a random dot pattern appearing to contain concentric swirls of dots. Furthermore, fMRI studies have shown that human V4 is more sensitive to concentric or radial gratings than to conventional gratings (Wilkinson et al. 2000). A quantitative model for V4 concentric receptive ﬁelds is depicted in Fig. 1(b) (Wilson et al. 1997). The initial two stages of the model are oriented simple cells followed by rectiﬁcation and orthogonal second stage ﬁltering. As mentioned above, these model end-stopped complex cells map onto V1 and V2 physiology, and they respond to curved contours that are locally tangential to the orientation of the simple cell ﬁlter. The ﬁnal stage of the V4 model consists of linear summation (∑) of concentrically arranged curvature information from V2. This represents an example of conﬁgural pooling, as each curvature response is pooled only if it arises in a spatial subregion of the receptive ﬁeld consistent with the concentric global pattern. The ﬁnal V4 receptive ﬁeld size is approximately three times the diameter of its V2 subunits, thus conforming to the observed receptive ﬁeld size increase in the form vision pathway. Similar V4 conﬁgural pooling models may be constructed to extract radial and other structures from the stimulus, and one challenge to future research will be an enumeration of conﬁgural pooling mechanisms.
What might be the function of V4 concentric units? Several observations suggest that they may be involved in the perception of faces and other ellipsoidal shapes. First, computer simulations have shown that model V4 units can detect faces in complex scenes (Wilson et al. 2000). Second, fMRI studies have shown that the face-selective area in the fusiform gyrus responds not only to faces but also to circular or ellipsoidal contours (Wilkinson et al. 2000), thus supporting the hypothesis that analysis of head shape is a major ingredient in face perception.
5. Perceptual Oscillations
There are many instances of instabilities resulting in perceptual oscillations, two of the most common being the Necker cube and binocular rivalry. A third example is the Marroquin illusion in which a static, periodic dot pattern triggers the percept of ﬂashing illusory circles (see Wilson et al. 2000). A Marroquin pattern is constructed by superimposing three square grids of dots, each rotated ±60° with respect to one another. It is natural to conjecture that competitive interactions among V4 concentric units might be the cause of the illusory circles triggered by this stimulus. Indeed, there is physiological evidence that competition among V4 neurons forms a basis for spatial and object based selective attention (Reynolds et al. 1999).
A neural network model incorporating spatially regional competitive interactions among V4 concentric units can explain the illusory ﬂashing Marroquin circles (Wilson et al. 2000). In this simulation V4 neural responses were represented by a variant of Eqn. (7):
In these equations the input P to each neuron is the response of the V4 model in Fig. 1(b) to the Marroquin pattern, SMarroquin minus a spatially weighted sum of inhibitory activity Ik generated by neighboring V4 neurons, the inhibitory neurons being driven by the E cells as described by the second equation. Parameters k and σ are respectively the gain and space constant of the inhibitory region. Mathematical analysis of equations such as these shows that one further ingredient is necessary for the network to produce limit cycle oscillations: the ﬁring rate of active excitatory neurons must decline over time (Wilson 1999). Excitatory cortical neurons are known to contain slow hyper- polarizing currents that cause the ﬁring rate to drop about threefold over the course of several hundred milliseconds, and this is incorporated into the mathematical model via the H or hyperpolarizing variable (Wilson 1999). The eﬀect of this variable, governed by g, is to increase the semisaturation constant of any active E neuron, thereby reducing the ﬁring rate. Simulations of this network provide a good quantitative explanation of the illusory ﬂashing circles generated by Marroquin patterns (Wilson et al. 2000). Extensions of this work suggest that models analogous to Eqn. (9) can also explain oscillations and traveling wave phenomena in binocular rivalry.
6. Conclusions And Future Directions
The low-level vision theories developed here are based on nonlinear diﬀerential equations describing ﬁring rates (rather than individual spikes) of retinal and cortical neurons. Large-scale neural networks incorporating these principles can now be used to simulate visual processing from the retina up to V4. Although this approach certainly represents but a ﬁrst-order approximation to the underlying neural activity, it nevertheless provides a solid foundation for further progress.
Many issues, of course, remain to be addressed. Key among these is the ubiquitous cortical feedback: IT back to V4, V4 back to V2, etc. One conjecture is that these connections mediate a rapid dynamical optimization of the processing circuitry based upon initial feed-forward estimates of the stimulus being analyzed: a sophisticated example of a self-organizing system. A second likelihood is that this feedback mediates both top-down inﬂuences on selective attention and ﬁgure- ground segmentation, which appears to involve extrastriate feedback to V1 (Lamme and Roelfsema 2000). The future promises a cooperative interaction between theory and experiment in elucidating these and many other exciting issues.
- Gallant J L, Braun J, VanEssen D C 1993 Selectivity for polar, hyperbolic, and Cartesian gratings in macaque visual cortex. Science 259: 100–3
- Heeger D J 1991 Nonlinear model of neural responses in cat visual cortex. In: Landy M, Movshon J A (eds.) Computational Models of Visual Processing. MIT Press, Cambridge, MA, pp. 119–33
- Hood D C 1998 Lower level visual processing and models of light adaptation. Annual Reviews in Psychology 49: 503–35
- Hubel D H, Wiesel T N 1968 Receptive ﬁelds and functional architecture of monkey striate cortex. Journal of Physiology 195: 215–43
- Lamme V A F, Roelfsema P R 2000 The distinct modes of vision oﬀered by feed-forward and recurrent processing. TINS 23: 571–7
- Reynolds J H, Chelazzi L, Desimone R 1999 Competitive mechanisms subserve attention in macaque areas V2 and V4. Journal of Neuroscience 19: 1736–53
- Sompolinsky H, Shapley R 1997 New perspectives on the mechanisms for orientation selectivity. Current Biology 7: 514–22
- von der Heydt R, Peterhans E 1989 Mechanisms of contour perception in monkey visual cortex. I. Lines of pattern discontinuity. Journal of Neuroscience 9: 1731–48
- Wassle H, Boycott B B 1991 Functional architecture of the mammalian retina. Physiology Reviews 71: 447–80
- Wilkinson F, James T W, Wilson H R, Gati J S, Menon R S, Goodale M A 2000 An fMRI study of the selective activation of human extrastriate form vision areas by radial and concentric gratings. Current Biology 10: 1455–8
- Wilson H R 1993 Nonlinear processes in visual pattern discrimination. Proceedings of the National Academy of Sciences USA 90: 9785–90
- Wilson H R 1997 A neural model of foveal light adaptation and after image formation. Vision Neuroscience 14: 403–23
- Wilson H R 1999 Spikes, Decisions, and Actions: Dynamical Foundations of Neuroscience. Oxford University Press, Oxford, UK
- Wilson H R, Krupa B, Wilkinson F 2000 Dynamics of perceptual oscillations in form vision. Nature Neuroscience 3: 170–6
- Wilson H R, Wilkinson F, Asaad W 1997 Concentric orientation summation in human form vision. Vision Research 37: 2325–30