Low-Level Theory Of Vision Research Paper

View sample Low-Level Theory Of Vision Research Paper. Browse other  research paper examples and check the list of research paper topics for more inspiration. If you need a religion research paper written according to all the academic standards, you can always turn to our experienced writers for help. This is how your paper can get an A! Feel free to contact our custom writing service for professional assistance. We offer high-quality assignments for reasonable rates.

1. Introduction

The primate visual system comprises over 30 cortical areas in addition to the retina and other subcortical structures. To model the lower levels of form vision or motion perception, it therefore becomes necessary to develop neural network theories of retinal, striate, and extrastriate cortical function that are consistent with known anatomy and physiology and that are capable of predicting relevant psychophysical data. The mathematical techniques necessary for this enterprise include convolution and ‘nonlinear dynamical systems’ theory. This research paper will review current theories of retinal and striate cortical function followed by a discussion of neural models related to areas ‘V2’ and ‘V4’ in the form vision hierarchy.

2. Theories Of Retinal Function

Retinal neurons transform the optical image in order to extract biologically relevant visual information concerning light intensity changes in space (i.e. contours and edges), changes in time, and chromaticity. Furthermore, the retina must shift its operating point so as to signal useful information over more than a brightness range, a process termed light adaptation. These transformations are carried out by networks comprising at least five different types of neurons: photoreceptors, bipolar cells, horizontal cells, amacrine cells, and ganglion cells (see Wassle and Boycott 1991). As the extraction of color information is covered elsewhere, let us focus here on light adaptation and the extraction of contours.

Light intensities in natural scenes cover more than a 106 range, yet the ganglion cells providing the retinal output to the brain can only respond over about a 300: 1 range. Thus, the retina must compute the mean level of illumination at any given time and adjust the circuitry to respond to about a 300:1 range around that operating point. Key experiments on retinal light adaptation have demonstrated that there are two major processes involved: one subtractive and one divisive (see Hood 1998). The direct path through the retina involves photoreceptors activating bipolar cells, which in turn activate ganglion cells. In addition the retina incorporates two major feedback loops: (a) from photoreceptors to horizontal cells then back to photoreceptors, and (b) from bipolars to amacrine cells back to bipolars. A recent retinal model has tentatively identified each of these loops with one of the light adaptation processes (Wilson 1997). In particular, loop (a) was conjectured to provide sub- tractive feedback, while loop (b) was suggested to be the site of divisive feedback. In a subtractive loop, the feedback signal is subtracted from the loop input, while in a divisive loop, the feedback signal divides the loop input signal thus reducing its gain. (More likely, each feedback loop incorporates both subtraction and division, suggesting a somewhat more complex formulation.) It is significant that both horizontal cells and the amacrine cells, the key units in these two feedback loops, have a relatively broad horizontal spread across the retinal surface and thus provide feedback that is averaged over an extended area. Subtractive feedback is well understood in the context of linear systems theory, but divisive feedback moves into the realm of ‘nonlinear dynamics’. Designating the bipolar cell response as B, the amacrine feedback response as A, and the light-derived signal received from the photoreceptors as L, the simplest divisive feedback equations are of the form (Wilson 1997):

These equations may be analyzed using standard techniques (Wilson 1999) of nonlinear dynamical systems theory. In particular, one can solve for the steady state or equilibrium value at which dB/dt = 0 and dA/dt = 0. The results for very small and very large values of L are approximately:

Thus, at very low light intensity levels the bipolar response B is approximately equal to its input, so there is no attenuation of weak signals. At high light intensity levels, however, B is compressed to approximately the (N + 1) root of L. For N = 2, a physiologically plausible figure, the divisive feedback loop in Eqn. (1) will compress an L = 106 input into a B = 100 output, easily compatible with the response range of the ganglion cells. A further property of Eqn. (1) is that the temporal response becomes more transient as L increases (Wilson 1999), and this change is well documented in the literature on flicker sensitivity and light adaptation (Wilson 1997).

As is well known, retinal receptive fields have an excitatory center and inhibitory surround (or vice versa) spatial profile. Direct bipolar input to the ganglion cells produces the receptive field center, while the lateral spread of both horizontal and amacrine cells creates the antagonistic surround. Thus, the neurons engaged in light adaptation via feedback also produce the spatial surround of ganglion cell receptive fields. As is evident from the discussion above, divisive feedback has almost no effect at low light levels (Eqn. (2)), so one would expect receptive field surrounds to be very weak at low luminances but much stronger at higher luminances, and this has been demonstrated experimentally. For some simulation purposes the spatial characteristics of retinal receptive fields may be approximated as a circularly symmetric difference of Gaussians:

where R is radius, and σc and σs are the space constants of the center and surround, respectively. The inhibitory gain B(L) would vary from B(0) ≈ 0 to B(L) ≈ Aσcs for L ≥ 0, which makes the integral of the receptive field ∫G dR = 0. For further discussion of retinal models see Wilson (1997) and Hood (1998).

3. Theories Of Striate Cortical Function

Hubel and Wiesel (1968) categorized striate (V1) cortical cells as simple or complex. Both of these cell types show a preference for a particular orientation and also for a range of spatial frequencies. The now classical model for a simple cell receptive field (RF) is just an oriented linear filter with parallel excitatory and inhibitory zones defined by horizontal (x) and vertical ( y) coordinates:

Equation (4) describes a vertically oriented receptive field, but other orientations (varying in about 15° increments between 0 and 180 ) can be produced by coordinate rotation. Based on physiological data, σy ≈ 3.2σc produces appropriate orientation band- widths. The activation R(x, y) of a cortical array of simple cell receptive fields is produced by convolution:

where I(x`, y`) represents the distribution of light in the image. Simulated neural responses are then produced by passing the results of Eqn. (5) through a nonlinear function exhibiting both a threshold and saturation such as the Naka–Rushton function

where M is the maximum response level and N ≈ 2–3 for cortical neurons. As S(R) = M/2 when R = β, β is referred to as the semisaturation constant. Dynamics may now be added to this formulation as follows (Wilson 1999):

where E is the excitatory firing rate and τ is the time constant for the rate of exponential approach to the asymptotic firing rate.

The simple cell model in Eqns. (4)–(7) can be used to generate complex networks of interconnected model neurons for simulations of cortical function (Wilson 1999). However, contemporary research indicates that orientation selectivity is more complex than this. In particular, the linear filtering in Eqn. (4) suggests that orientation tuning originates entirely from a linear superposition of nonoriented lateral geniculate (LGN) filters similar to that in Eqn. (3). While this was Hubel and Wiesel’s (1968) original suggestion for orientation selectivity, the preponderance of evidence now supports additional factors in orientation tuning. First, convergence of as few as two nonoriented LGN filters conveys a modest orientation bias on cortical neurons. This very broad orientation bias is then sharpened considerably by two operations occurring in neural networks at the cortical level: contrast gain control and collinear facilitation (Sompolinski and Shapley 1997).

Contrast gain controls were first modeled by Heeger (1991) as divisive feedback operations. He suggested that the summed activity of many different oriented units provided a divisive feedback signal to each of the units in turn. The divisive feedback loop in Eqn. (1) can be altered to function as a cortical gain control as follows:

where θ designates the preferred orientation of each unit. Substitution of the Naka–Rushton function S from Eqn. (6) and rearrangement shows that this cortical divisive feedback circuit has the effect of scaling β by (1 + A). This network has been implemented and shown to sharpen cortical orientation tuning (Wilson 1993). Due to the dynamics of the network, this orientation sharpening evolves over time as the divisive feedback takes effect, and this has recently been shown to occur in most cortical neurons (Sompolinsky and Shapley 1997).

The final contribution to cortical orientation selectivity is provided by collinear facilitation (Sompolinsky and Shapley 1997). Recent evidence indicates that neurons with the same preferred orientation are interconnected if their receptive fields are situated on a common line defined by their preferred orientation. Such reciprocally linked receptive fields are typically separated by 1–2 receptive field lengths so that their mutual facilitation will only be triggered by stimulus edges of sufficient length. The excitatory connections are probably mediated by NMDA receptors so that both cells in a collinear pair must be independently activated by the stimulus for facilitation to occur. Obviously, collinear facilitation will sharpen orientation tuning when measured with extended lines or gratings. The function of collinear facilitation is to enhance the salience of extended contours, thus making them prominent candidates for further processing.

4. Global Processes In Extrastriate Form Vision

The complex cells of Hubel and Wiesel can be readily modeled by spatial pooling of simple cells with the same preferred orientation. More interesting are endstopped complex cells, which are sensitive to both contour length and curvature. As illustrated in Fig. 1(a), an end-stopped complex cell can be modeled by full-wave rectification of simple cell responses followed by second-stage pooling using a 2–3 times larger version of the oriented filter defined by Eqn. (4). Fullwave rectification is mathematically equivalent to summing responses from both on-center and off-center simple cells that have the same spatial location and preferred orientation. The first and second-stage filters in this filter–rectify–filter sequence usually have different preferred orientations, with the orthogonal case being illustrated in Fig. 1(a). The optimal line stimulus for this end-stopped unit will obviously be a horizontal line with length equal to the width of the second filter’s excitatory zone as shown superimposed on the diagram. In addition to lines of fixed length, end-stopped cells will respond well to curves that are tangent to the preferred orientation of the first-stage simple cells, and they respond well to the orientation of texture boundaries (Wilson 1993). Although Hubel and Wiesel (1968) reported the presence of endstopped neurons in V1, subsequent work by von der Heydt and Peterhans (1989) indicates that they predominate in the second visual area, V2.

Low-Level Theory Of Vision Research Paper

Following V1 and V2 the cortical form vision pathway projects to V4, and finally to inferior temporal cortex (IT). Mean receptive field dimensions increase by a factor of approximately 2.5–3 from area to area within this pathway, reaching diameters of 10°–30° in IT. It was noted above that the receptive fields of end-stopped neurons must be approximately 2–3 times larger in each dimension than the simple cell receptive fields that provide their inputs in order to function optimally. Thus, the increasing receptive field size throughout the form vision pathway indicates that pooling operations are combining information over increasingly large areas. The nature of pooling in V4 was first investigated by Gallant et al. (1993). They studied macaque V4 responses to concentric and radial gratings as well as to conventional sine wave gratings. One subclass of neurons responded best to concentric gratings, while a second subclass responded best to radial gratings. Although many neurons responded to conventional sine wave gratings, few were optimized for these patterns.

Humans are extremely good at detecting concentric structure in glass patterns, and this results from concentric summation of orientation information over considerable regions (Wilson et al. 1997). A glass pattern is produced by randomly positioning pairs of signal dots such that each pair is oriented tangentially to the desired concentric structure. This produces a random dot pattern appearing to contain concentric swirls of dots. Furthermore, fMRI studies have shown that human V4 is more sensitive to concentric or radial gratings than to conventional gratings (Wilkinson et al. 2000). A quantitative model for V4 concentric receptive fields is depicted in Fig. 1(b) (Wilson et al. 1997). The initial two stages of the model are oriented simple cells followed by rectification and orthogonal second stage filtering. As mentioned above, these model end-stopped complex cells map onto V1 and V2 physiology, and they respond to curved contours that are locally tangential to the orientation of the simple cell filter. The final stage of the V4 model consists of linear summation (∑) of concentrically arranged curvature information from V2. This represents an example of configural pooling, as each curvature response is pooled only if it arises in a spatial subregion of the receptive field consistent with the concentric global pattern. The final V4 receptive field size is approximately three times the diameter of its V2 subunits, thus conforming to the observed receptive field size increase in the form vision pathway. Similar V4 configural pooling models may be constructed to extract radial and other structures from the stimulus, and one challenge to future research will be an enumeration of configural pooling mechanisms.

What might be the function of V4 concentric units? Several observations suggest that they may be involved in the perception of faces and other ellipsoidal shapes. First, computer simulations have shown that model V4 units can detect faces in complex scenes (Wilson et al. 2000). Second, fMRI studies have shown that the face-selective area in the fusiform gyrus responds not only to faces but also to circular or ellipsoidal contours (Wilkinson et al. 2000), thus supporting the hypothesis that analysis of head shape is a major ingredient in face perception.

5. Perceptual Oscillations

There are many instances of instabilities resulting in perceptual oscillations, two of the most common being the Necker cube and binocular rivalry. A third example is the Marroquin illusion in which a static, periodic dot pattern triggers the percept of flashing illusory circles (see Wilson et al. 2000). A Marroquin pattern is constructed by superimposing three square grids of dots, each rotated ±60° with respect to one another. It is natural to conjecture that competitive interactions among V4 concentric units might be the cause of the illusory circles triggered by this stimulus. Indeed, there is physiological evidence that competition among V4 neurons forms a basis for spatial and object based selective attention (Reynolds et al. 1999).

A neural network model incorporating spatially regional competitive interactions among V4 concentric units can explain the illusory flashing Marroquin circles (Wilson et al. 2000). In this simulation V4 neural responses were represented by a variant of Eqn. (7):

In these equations the input P to each neuron is the response of the V4 model in Fig. 1(b) to the Marroquin pattern, SMarroquin minus a spatially weighted sum of inhibitory activity Ik generated by neighboring V4 neurons, the inhibitory neurons being driven by the E cells as described by the second equation. Parameters k and σ are respectively the gain and space constant of the inhibitory region. Mathematical analysis of equations such as these shows that one further ingredient is necessary for the network to produce limit cycle oscillations: the firing rate of active excitatory neurons must decline over time (Wilson 1999). Excitatory cortical neurons are known to contain slow hyper- polarizing currents that cause the firing rate to drop about threefold over the course of several hundred milliseconds, and this is incorporated into the mathematical model via the H or hyperpolarizing variable (Wilson 1999). The effect of this variable, governed by g, is to increase the semisaturation constant of any active E neuron, thereby reducing the firing rate. Simulations of this network provide a good quantitative explanation of the illusory flashing circles generated by Marroquin patterns (Wilson et al. 2000). Extensions of this work suggest that models analogous to Eqn. (9) can also explain oscillations and traveling wave phenomena in binocular rivalry.

6. Conclusions And Future Directions

The low-level vision theories developed here are based on nonlinear differential equations describing firing rates (rather than individual spikes) of retinal and cortical neurons. Large-scale neural networks incorporating these principles can now be used to simulate visual processing from the retina up to V4. Although this approach certainly represents but a first-order approximation to the underlying neural activity, it nevertheless provides a solid foundation for further progress.

Many issues, of course, remain to be addressed. Key among these is the ubiquitous cortical feedback: IT back to V4, V4 back to V2, etc. One conjecture is that these connections mediate a rapid dynamical optimization of the processing circuitry based upon initial feed-forward estimates of the stimulus being analyzed: a sophisticated example of a self-organizing system. A second likelihood is that this feedback mediates both top-down influences on selective attention and figure- ground segmentation, which appears to involve extrastriate feedback to V1 (Lamme and Roelfsema 2000). The future promises a cooperative interaction between theory and experiment in elucidating these and many other exciting issues.


  1. Gallant J L, Braun J, VanEssen D C 1993 Selectivity for polar, hyperbolic, and Cartesian gratings in macaque visual cortex. Science 259: 100–3
  2. Heeger D J 1991 Nonlinear model of neural responses in cat visual cortex. In: Landy M, Movshon J A (eds.) Computational Models of Visual Processing. MIT Press, Cambridge, MA, pp. 119–33
  3. Hood D C 1998 Lower level visual processing and models of light adaptation. Annual Reviews in Psychology 49: 503–35
  4. Hubel D H, Wiesel T N 1968 Receptive fields and functional architecture of monkey striate cortex. Journal of Physiology 195: 215–43
  5. Lamme V A F, Roelfsema P R 2000 The distinct modes of vision offered by feed-forward and recurrent processing. TINS 23: 571–7
  6. Reynolds J H, Chelazzi L, Desimone R 1999 Competitive mechanisms subserve attention in macaque areas V2 and V4. Journal of Neuroscience 19: 1736–53
  7. Sompolinsky H, Shapley R 1997 New perspectives on the mechanisms for orientation selectivity. Current Biology 7: 514–22
  8. von der Heydt R, Peterhans E 1989 Mechanisms of contour perception in monkey visual cortex. I. Lines of pattern discontinuity. Journal of Neuroscience 9: 1731–48
  9. Wassle H, Boycott B B 1991 Functional architecture of the mammalian retina. Physiology Reviews 71: 447–80
  10. Wilkinson F, James T W, Wilson H R, Gati J S, Menon R S, Goodale M A 2000 An fMRI study of the selective activation of human extrastriate form vision areas by radial and concentric gratings. Current Biology 10: 1455–8
  11. Wilson H R 1993 Nonlinear processes in visual pattern discrimination. Proceedings of the National Academy of Sciences USA 90: 9785–90
  12. Wilson H R 1997 A neural model of foveal light adaptation and after image formation. Vision Neuroscience 14: 403–23
  13. Wilson H R 1999 Spikes, Decisions, and Actions: Dynamical Foundations of Neuroscience. Oxford University Press, Oxford, UK
  14. Wilson H R, Krupa B, Wilkinson F 2000 Dynamics of perceptual oscillations in form vision. Nature Neuroscience 3: 170–6
  15. Wilson H R, Wilkinson F, Asaad W 1997 Concentric orientation summation in human form vision. Vision Research 37: 2325–30
Psychology Of Vision Research Paper
High-Level Theory Of Vision Research Paper


Always on-time


100% Confidentiality
Special offer! Get discount 10% for the first order. Promo code: cd1a428655