Connectionist Approaches Research Paper

Sample Connectionist Approaches Research Paper. Browse other research paper examples and check the list of research paper topics for more inspiration. iResearchNet offers academic assignment help for students all over the world: writing from scratch, editing, proofreading, problem solving, from essays to dissertations, from humanities to STEM. We offer full confidentiality, safe payment, originality, and money-back guarantee. Secure your academic success with our risk-free services.

1. Deﬁnitions

To facilitate the following discussion, it will be helpful to ﬁrst deﬁne some terms. A typical connectionist network comprises a (potentially large) number of simple processing units. The units are often called (artiﬁcial) neurons, but that terminology begs the question of their relation to biological neurons, so it will be avoided in this research paper. In the most common case, the units form a weighted sum of their (quantitative) inputs and pass the result through a simple, nonlinear activation function, which limits the range of possible outputs. The resulting value is considered the activity of the unit, which may be transmitted to other units (through outgoing connections). In some cases the activity of a unit is a combination of its inputs and previous activity, which provides a kind of ‘short-term memory’ residing in the collective activities of the units.

Academic Writing, Editing, Proofreading, And Problem Solving Services

Get 10% OFF with 24START discount code

The weighted sum results from the fact that each connection in the network has an associated weight (analogous to synaptic eﬃcacy in biological neural networks), which multiplies the quantity transmitted by that connection. Positive weights correspond to excitatory connections and negative weights to inhibitory; zero-valued weights correspond to the absence of a connection. Mathematically, connection weights are often treated as a weight matrix W, with element W_ij being the weight of the connection to unit i from unit j. Learning and adaptation take place by modiﬁcation of the weights according to some learning algorithm (Sect. 3); thus the connections constitute the network’s ‘long-term memory.’ ‘Connectionism’ derives its name from the fact that knowledge resides in the patterns and weights of the connections.

Many connectionist networks are organized into layers, analogous to functional areas in the brain; information usually moves in lockstep from layer to layer. Although many networks are feed-forward, that is, the information moves through successive layers from input to output, other networks are recurrent, which means that there may be feedback connections from a layer to itself or to earlier layers. Recurrent networks are able to recognize and process temporally- extended patterns, that is, sequences of related inputs.

It must be stressed that there are exceptions to all of the preceding general statements about connectionist networks, and ‘connectionist approaches’ are best viewed as forming a Wittgensteinian ‘family resemblance.’

2. History

A short history of connectionist approaches will be presented: ﬁrst in the narrower context of cognitive science and artiﬁcial intelligence; then in the broader context of epistemology, linguistics and the philosophy of mind. Although the size of this research paper does not permit detailed citations of the literature, many of the seminal articles are collected in Anderson and Rosenfeld (1988) and Anderson et al. (1990).

2.1 Narrower History

According to the second edition of the Oxford English Dictionary, the term ‘connectionism’ was ﬁrst used by E. L. Thorndike, in his Fundamentals of Learning (1932), to refer to the reduction of mental processes to the connections between stimuli and responses (that is, to a form of associationism), and so connectionist theories have been set in opposition to cognitive theories. The term is used somewhat diﬀerently now (so that ‘connectionist cognitive science’ is not an oxymoron), but retains some similarities to associationism. However, to understand the relation it is better to look at connectionism from the perspective of neural network models of cognition.

In the early 1940s W. S. McCulloch and W. Pitts investigated the computation of logical functions by simple neuron-like elements; in eﬀect they showed that these elements could compute logical ‘and,’ ‘or,’ ‘not,’ and so forth. In his Organization of Behavior (1949) D. O. Hebb suggested that learning takes place by the formation of cell assemblies and that this occurs through the strengthening of connections between simultaneously active neurons in neural networks, which are initially randomly connected. His description of this process inspired one of the simplest connectionist learning rules (Sect. 3.1).

In the late 1950s F. Rosenblatt began to investigate the application of simple neuron models called perceptrons to perceptual problems such as classifying printed letters. He developed a learning algorithm for simple (single-layer) perceptron networks, which iteratively adjusted the connection weights whenever the network made a mistake. He proved that, if the network were capable of solving the problem at all, then the algorithm would eventually ﬁnd the connection weights to solve it. However, there are many problems that a single-layer network cannot solve, and Rosenblatt never succeeded in ﬁnding a multilayer learning algorithm.

A key event in the history of connectionism was the publication of M. Minsky and S. Papert’s Perceptrons (1969), which demonstrated limitations of simple perceptron networks. Speciﬁcally, they proved that single-layer perceptron nets could discriminate only those categories that are linearly separable. Their proof did not apply to multilayer nets (for which there was still no learning algorithm), but they suggested that similar limitations would be found for these too. Nevertheless, their book was widely interpreted as showing the impotence of neural networks in general, and is commonly blamed for discouraging research in the ﬁeld for a decade. (The extent to which it actually did so is, perhaps, a topic for historians of science.)

It will be worthwhile to comment on the role of holography in the development of connectionist approaches. As early as 1929 K. S. Lashley had con-ducted experiments suggesting that individual memory traces were not localized in any one place, and that degradation of memory was proportional to the amount of cortical mass destroyed, thus implying that individual traces were distributed over large areas of cortex. In his well-known 1950 paper, he despaired of ever understanding how such a nonlocal memory could operate. The principles of holography had been described by D. Gabor in 1948, but it was not until the advent of optical holography in the early 1960s that it began to be seen as a solution to Lashley’s dilemma. Although an analogy between holography and memory had been suggested as early as 1963 by P. J. van Heerden, the ‘holographic hypothesis’ has been developed most extensively by K. H. Pribram and his colleagues since 1966 (see, e.g., Anderson et al. (1990), ch. 7). In the late 1960s and early 1970s holographic and holography-inspired models of associative memory were also investigated by H. C. Longuet-Higgins, D. J. Willshaw and others (Hinton and Anderson, 1989). Some connectionists were inﬂuenced by the critiques of traditional, rule-based AI by the phenomenologist philosopher H. L. Dreyfus (What Computers Can’t Do 1972). Although he stressed the limitations of rule-based systems, he also suggested that some of these limitations would not apply to analog systems operating on principles similar to holography.

Although a number of investigators (including J. A. Anderson, S. Grossberg, and T. Kohonen) continued connectionist research through the 1970s, the ﬁeld was rejuvenated by the work of D. E. Rumelhart, J. L. McClelland, and other members of the ‘PDP (Parallel Distributed Processing) Working Group,’ many of whose publications were collected in a widely-read two-volume set (Rumelhart et al., 1986). The credibility of connectionist approaches was also enhanced by J. Hopﬁeld’s publication in 1982 of a simple recurrent net capable of associative memory and pattern completion.

2.2 Broader History

Although connectionism can be viewed as an approach to knowledge representation and inference that is of relevance only to cognitive science, in fact it has much broader implications, for it challenges assumptions about knowledge that have been largely unquestioned since ancient Greek philosophy. Already in the philosophies of Socrates, Plato, and Aristotle there is a preference for knowledge expressed as logical relations among discrete, language-like structures, and for a view of cognition as mechanized deduction. These ideas inﬂuenced many later philosophers, including Hobbes (who equated thinking with computation), Leibniz (who experimented with formalized systems of knowledge representation and mechanical deduction), and Boole (who invented mathematical logic).

These ideas were also inﬂuential in the development of logical positivism, which dominated the philosophy of science in the ﬁrst half of the twentieth century. The idea persisted in the assumption that there must be a ‘language of thought,’ because no alternative is imaginable, and it is ‘the only game in town.’ Similarly, most research in artiﬁcial intelligence (AI) took for granted that intelligence resides in the structures of a knowledge representation language, and in deduction-like formal rules for their manipulation. Throughout the 1970s (the ‘connectionist dark ages’), AI re-searchersconcentratedtheirattentiononexpertsystems, which depended on expertise represented symbolically. Disappointment with the performance of these systems was one of the motivations for the connectionist renaissance.

In summary, the Western tradition (with some exceptions) has displayed a kind of ‘linguistic chauvinism,’ which presumes that all knowledge and cognition can be expressed in language-like structures. Knowledge is expressed at a symbolic level, that is, in terms of atomic (indivisible), word-level categories related by sentence-like logical structures. On the other hand, most connectionist approaches represent knowledge at a subsymbolic level, that is, in terms of minute, quantitative features related by low-level, often statistical, connections. In other words, knowledge is more akin to an image than to a sentence. Therefore, some of connectionism’s advocates see it as a fundamentally new view of knowledge and cognition, which is leading to a paradigm shift in cognitive science and philosophy and is engendering a ‘new AI.’

3. Mechanisms Of Adaptation And Learning

Virtually all connectionist approaches incorporate adaptive mechanisms or learning algorithms, which allow the network to improve its performance; here a few will be discussed brieﬂy.

3.1 Correlational Learning

Correlational learning (‘Hebb’s rule’), the simplest connectionist learning algorithm, takes its inspiration from Hebb’s hypothesis that the simultaneous activity of two neurons strengthens the connection between them. It makes a change in connection weight proportional to the product of activities of the units it connects. Thus, the change in the weight W_ijof the connection to unit i from unit j is proportional to y_ix_j, where y_i is the activity of unit i and x_j is the activity of j. This learning rule can be viewed as a highly simpliﬁed model of long-term potentiation.

The eﬀect of this rule is that the weight becomes a correlation coeﬃcient between the activities of the units it connects. That is, the connection will become stronger (more positive) to the extent that the units are simultaneously positive or simultaneously negative. The connection will become more inhibitory (more negative) to the extent that one unit is positive while the other is negative. If there is, on average, no systematic relation between the activities of the two units, then the weight will tend toward zero, eﬀectively disconnecting the units.

The correctional learning rule is the basis of a simple associative memory known as a linear associator. In this connectionist network, there are two layers of linear units, an input layer and an output layer; each input unit is connected to every output unit, so that the output is a linear function of the input, y=Wx. A series of pairs of pattern vectors (y₁, x₁),…, (y_p, x_p) may be presented to the input and output layers of such a network and the weights adjusted according to the learning rule. The goal of the linear associator is that the network associate each x_k with the corresponding y_k. In fact it can be shown that if the set of input patterns is orthogonal, then the output Wx_k will be proportional to the desired y_k.

3.2 Delta Rule

An improvement called the delta rule can be made in the linear associator; it illustrates a fundamental approach to connectionist learning. The idea is to deﬁne the error, as a function of the weight matrix, as the sum of the diﬀerences between the desired and actual outputs of the network, E(W)=∑^p_k=1 D(y^k, Wx_k). The weight matrix is then changed by gradient descent which means that the elements of W are changed in the relative proportion that causes a maximal incremental decrease of E(W).

If the diﬀerence between patterns is measur ed by Euclidean distance squared, D(y, y´)=|y-y´|² (that is, the sum-of-squares error), then the delta rule is essentially equivalent to linear regression. If the input patterns are linearly independent, then the delta rule will converge to a weight matrix that associates perfectly, y_k =Wx_k. If they are not, then the weight matrix will be that which minimizes the total error; that is, it will be the best linear prediction of the output patterns from the input patterns.

The delta rule also illustrates an important characteristic of most connectionist networks: their ability to generalize to inputs other than those upon which they have been trained. The delta rule provides only linear generalization, but other algorithms, such as backpropagation (Sect. 3.3) can make nonlinear generalizations. Typically, connectionist categories are represented by concrete prototypes rather than by deﬁnitions in terms of necessary and suﬃcient conditions or other abstract symbolic structures. Network behavior then depends on similarity to the prototypes rather than on formal manipulation of symbolic structures.

3.3 Backpropagation

Gradient descent may be applied also to multilayer networks of nonlinear units, so long as the activation function is diﬀerentiable. The backpropagation algorithm (also called the generalized delta rule) eﬃciently computes the weight changes by starting with the last layer and working backward layer by layer. It has been rediscovered a number of times, perhaps ﬁrst by P. Werbos in 1974, but its importance in connectionism began with its rediscovery in the early 1980s. There are also special adaptations of backpropagation for recurrent networks. Backpropagation has a number of limitations, including: (a) it may be quite slow, (b) it does not necessarily take the shortest path to an error minimum, and (c) it may get trapped in local minima. Nevertheless, it remains a fundamental learning algorithm and has been subject to many practical improvements.

3.4 Other Learning Algorithms

The preceding are examples of supervised learning procedures, which means that the ‘correct answer’ y_k is available for each training input x_k. Although this is appropriate for modeling some cognitive processes and for many practical problems, for others unsupervised learning is preferable. In these procedures, the network is not trained to produce any speciﬁc outputs, but it allowed to group or categorize inputs according to standards inherent in its design. Thus unsupervised learning is often equivalent to some kind of statistical clustering. Between these two extremes is reinforcement learning, in which the algorithm is told whether or not the output is correct, but not what the correct output is. In unsupervised and reinforcement learning, as in supervised learning, the network is normally expected to generalize reasonably to novel inputs. There are now hundreds of connectionist learning algorithms, of greater and lesser relevance to cognitive science and neuroscience, but this must suﬃce for an introduction.

4. Example: Learning Past Tenses

Since the early 1980s, connectionist networks have been used for an enormous number of practical applications and for modeling many aspects of cognition. One notable example must suﬃce here; see Rumelhart et al. (1986) for additional examples.

Rumelhart et al. (1986, Vol. 2, Chap. 18) trained a connectionist network to produce the past tenses of English verbs. The inputs were vectors representing phonological features of the present tenses and the outputs were vectors representing the phonological features of the corresponding past tenses. Learning was observed to pass through three stages. In the ﬁrst stage, the most common verbs were learned, essentially by rote as individual special cases. In the second stage, the network learned how to form regular past tenses (for it was able to generalize to novel regular verbs), but over-generalized by treating the (previously correctly processed) irregular verbs as though they were regular. In the third stage, the network (re)learned the correct formation of the irregular past tenses without losing its ability to form regular past tenses. It is interesting and highly suggestive that children pass through these same three stages; the model also exhibited errors of the same kinds made by children. Although this experiment can be (and has been) criticized on a number of grounds as a model of language learning, it is perhaps more valuable as a demonstration of how a connectionist network can exhibit apparently rule-like behavior, but not be following any explicit rules. In particular, exceptions to the rules are handled automatically without explicit accommodation. Thus it is paradigmatic of connectionist cognitive models.

5. Issues

5.1 Connectionist Representations

Connectionist approaches represent and process in-formation in a way that is fundamentally diﬀerent from symbolic approaches, in which knowledge is represented in discrete structures relating atomic lexical-level features (i.e., categories of the sort for which natural languages have words). In symbolic approaches, information is processed by formal logic-like rules, which rearrange these atomic units of meaning. In connectionist approaches, on the other hand, information is represented by patterns of activity distributed over large numbers of units, which individually have no lexical-level meaning. The latter are often termed microfeatures, but they are fundamentally diﬀerent from features, which are supposed to be complete, context-free units of meaning. Micro-features are just components of distributed representations and are usually individually uninterpretable.

Instead of rules, connectionist information processing is deﬁned by quantitative connections between microfeatures and takes place at a subsymbolic level (Smolensky 1988). Cognition then is an emergent eﬀect of large numbers of these interactions (which therefore constitute the microstructure of cognition). Indeed, according to this account, apparent symbolic processing is just such an emergent eﬀect of these subsymbolic interactions.

Traditional (symbolic) models of knowledge have been criticized for their brittleness. For example, when a set of rules is formulated in an attempt to model the behavior of some expert, it is generally found that the rules do worse than the expert, since the expert applies rules ﬂexibly and makes ad hoc exceptions to them as required by the particulars of the situation. Of course, additional rules can be formulated to cover the exceptions, but then these are likewise found to have exceptions, and so forth. The attempt to reduce ﬂexible behavior to inﬂexible rules leads to a combinatorial explosion of possibilities which exceeds the capacities of brains and computers.

Connectionist approaches seem better able to ac-count for the ﬂexibility and context-sensitivity of natural intelligence. This is because the connection weights function as a large number of soft constraints, none of which is individually necessary or suﬃcient to produce a result. Therefore connectionist networks can accommodate inputs that are exceptional in various ways, either by ignoring the anomalous aspects, or by corresponding adjustment of their output. As a consequence, their performance is also robust in the face of noise in the input or damage to the network. Further, to the extent that microfeatures of the environment are represented in the input, the network can process information in a way that is sensitive to the context.

5.2 Criticisms

Connectionist networks have been criticized for their opacity or uninterpretability. That is, when a connectionist network has been trained to perform some task, it is diﬃcult to extract human-interpretable rules from the network; although the network performs correctly, one cannot understand the ‘rules’ it is apparently following. The reason of course is that it is not following rules, and the individual units and weights represent microfeatures and constraints that are lexically meaningless (i.e., have no lexical-level meaning). There are mathematical procedures for extracting rule-like information from networks, but they give only approximations to the network’s behavior. This is analogous to what has been observed during ‘knowledge acquisition’ for expert systems: after the fact, human experts can account in terms of rules for their decision making, but the rules do not account adequately for the expert’s future decisions.

Quite naturally, some of the severest criticisms of connectionism have come from linguists and other cognitive scientists committed to a ‘language of thought,’ that is, to the hypothesis that cognition must be understood in terms of the manipulation of propositional or sentential symbolic structures. These criticisms have focused on the alleged inability of ‘ﬂat’ connectionist representations to capture the rich hierarchical symbolic structure of human language and propositional attitudes, and the related sensitivity of cognition to their constituent structure. However, experiments in ‘connectionist symbol processing’ have shown that connectionist networks can be sensitive to the constituent structure of representations without explicit representation of that structure and without the use of explicit symbolic rules (e.g., Sect. 4). There is much more to the issue than this, however, and the early collection by Pinker and Mehler (1988) is still a good introduction to the anti-connectionist position.

5.3 Computability

If connectionism is viewed as a fundamentally new approach to information representation and processing, then the question arises of its power relative to conventional digital computation, as modeled by the Turing machine. At a basic level this question is easy to answer. On one hand, since connectionist networks are routinely simulated on digital computers, it is apparent that they have no greater power than a Turing machine. On the other hand, researchers have shown that various sorts of connectionist networks can simulate Turing machines, which therefore have no greater power than the networks. The conclusion would seem to be that connectionist networks are equivalent to Turing machines in computing power.

However, at a deeper level the question is problematic, for the Turing machine model is based on certain idealizing assumptions about what is signiﬁcant and insigniﬁcant in models of computation, assumptions that are questionable when applied to connectionist models. In particular, the Turing ma-chine model is based on the assumption that computation proceeds by the recognition of atomic tokens of deﬁnite type according to the discrete application of ﬁnite, deﬁnite rules; these processes are assumed to operate with complete reliability. These assumptions are a poor match to connectionist approaches, in which information is represented in distributed pat-terns of continuous activity, and in which recognition is a matter of degree. Nevertheless, questions of computability are also important in connectionism, but relevant answers may require the development of a new theory of computation that makes idealizing assumptions more relevant to connectionist approaches.

5.4 Relation To Biological Neural Networks

Connectionist networks are often called ‘neural net-works’ and described in terms of (artiﬁcial) neurons connected by (artiﬁcial) synapses, but is this more than a metaphor? Generally, connectionist models have reﬂected the contemporary understanding of neurons. For example, McCulloch and Pitts focused on the ‘all or nothing’ character of neuron ﬁring, and modeled neurons as digital logic gates. Newer connectionist models have had a more analog focus, and so the activity level of a unit is often identiﬁed with the instantaneous ﬁring rate of a neuron. However, these models still ignore many important properties of real neurons, which may be relevant to neural information processing (Rumelhart et al., 1986 , vol. 2, Chap. 20). As a consequence neuroscientists have stressed the diﬀerences between biological neurons and the simple units in connectionist networks; the relation between the two remains an open problem. Nevertheless, it is much easier to envision neural implementations of connectionist networks than of symbol-processing architectures.

Bibliography:

Anderson J A, Rosenfeld E (eds.) 1988 Neurocomputing: Foundations of Research. MIT Press, Cambridge, MA
Anderson J A, Pellionisz A, Rosenfeld E (eds.) 1990 Neuro-computing 2: Directions for Research. MIT Press, Cambridge, MA
Churchland P S 1986 Neurophilosophy: Toward a Uniﬁed Science of the Mind Brain. MIT Press, Cambridge, MA
Dreyfus H L 1972 What Computers Can’t Do. Harper & Row, New York
Gabor D 1948 A new microscopic principle. Nature 161: 777–8
Hebb D O 1949 The Organization of Behavior. Wiley, New York
Hinton G E, Anderson J A (eds.) 1989 Parallel Models of Associative Memory, rev. edn. Lawrence Erlbaum Associates, Hillsdale, NJ
Hopﬁeld J J 1982 Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences, USA 79: 2554–8
Lashley K S 1950 In search of the engram. Symposia of the Society for Experimental Biology 4: 454–82
Minsky M, Papert S 1969 Preceptrons. MIT Press, Cambridge, MA
Pinker S, Mehler J (eds.) 1988 Connections and Symbols. MIT Press, Cambridge, MA
Quinlan P 1991 Connectionism and Psychology: A Psychological Perspective on New Connectionist Research. University of Chicago Press, Chicago
Rumelhart D E, McClelland J L, PDP Research Group 1986 Parallel Distributed Processing: Explorations in the Micro-structure of Cognition. MIT Press, Cambridge, MA
Smolensky P 1988 On the proper treatment of connectionism. Behavioral and Brain Sciences 11: 1–74
Thorndike E L 1932 Fundamentals of Learning. Teachers College, New York