Connectionist Models Of Language Processing Research

Sample Connectionist Models Of Language Processing Research. Browse other research paper examples and check the list of research paper topics for more inspiration. iResearchNet offers academic assignment help for students all over the world: writing from scratch, editing, proofreading, problem solving, from essays to dissertations, from humanities to STEM. We offer full confidentiality, safe payment, originality, and money-back guarantee. Secure your academic success with our risk-free services.

All connectionist models have in common that they consist of subunits that can, in one way or another, be likened to brain structures. The neuroanatomical structures the network elements stand for are either large brain regions and the pathways between them, or local clusters of neurons and their mutual connections, or individual neurons and the web of ﬁbers in which they are embedded. There are at least three major meanings of the term ‘Connectionist model’: (a) Classical connectionist models proposed by nineteenth-century neurologists specify centers and pathways that are analogous to cortical areas and ﬁber bundles most relevant to language, and related modern neurobiological approaches to language propose intensely connected cell assemblies with diﬀerent cortical distributions as the brain basis of language; (b) Symbolic connectionist models suggest single artiﬁcial neurons corresponding to linguistic units (language sounds, words, etc.); and (c) Distributed connectionist models represent such linguistic entities by activity vectors involving numerous neuronal elements. This research paper will explain these three diﬀerent notions by introducing the respective research areas (Sects. 1–3). In the ﬁnal Sect. (4), recent trends in the research on connectionist models of language will be highlighted.

Academic Writing, Editing, Proofreading, And Problem Solving Services

Get 10% OFF with 24START discount code

1. Neurological And Neurobiological Models Of Language

In the second half of the nineteenth century, connectionist theories were proposed by neurologists to summarize and model the eﬀect of brain lesions on cognitive functions (Caplan 1987). The underlying idea was that local processing systems, so-called centers, specialize in particular cognitive processes. These were thought to be autonomous processors. The centers were linked through so-called pathways which allowed for information exchange between them. The pathways had their analogue in ﬁber bundles in the white matter of the brain. The most famous classical neurological model of language goes back to Wernicke (1874) and Lichtheim (1885). This model explains basic features of organic language disturbances, aphasias. The main proposal of the Wernicke–Lichtheim model was that there are two centers for language processing, one mainly involved in speech production and the other primarily contributing to language comprehension. This hypothesis could be conﬁrmed by a wealth of clinical studies. It is now well-established that, in most right-handed subjects, the main or core language areas are located in the left inferior frontal lobe (Broca area for speech production) and in the left superior temporal lobe (Wernicke area for speech comprehension).

In the light of modern neuropsychological and neuroimaging research, early connectionist models turned out to be too crude to explain many patterns of brain activation induced by language processing, and the more ﬁne-grained aspects of language disorders caused by brain lesions. It now appears more likely that the ‘language centers’ of Broca and Wernicke are not functionally independent but that they are mutually dependent when functioning properly. Furthermore, while these core language areas are certainly important for language, they are certainly not the only cortical areas contributing to and necessary for language processing. This is demonstrated by neuroimaging research (EEG, fMRI, MEG, PET) showing that, in addition to the core language areas, other areas light up when speciﬁc language stimuli are being processed, and by converging neuropsychological reports about patients with lesions outside their core language areas who showed category-speciﬁc linguistic deﬁcits. These studies show that, in addition to the core language areas, there are additional or complementary language areas that are activated during language processing and whose lesion causes deterioration of aspects of language processing.

Mutual functional dependence of the core language areas and the category-speciﬁc role of complementary language areas are explained by a neurobiological model postulating that words and other language elements are cortically organized as strongly connected assemblies of neurons whose cortical distributions vary with word type (Pulvermuller 1999). Accordingly, concrete words referring to objects and actions are organized as widely distributed cell assemblies comprising neurons in sensory and motor areas involved in processing the words’ meanings. In contrast, highly abstract grammatical function words and grammatical aﬃxes are proposed to be more focally represented in the left-hemispheric core language areas of Broca and Wernicke (Pulvermuller 1999). In summary, neuron sets in core language areas appear to be relevant for all types of language-related processes, and complementary neurons in areas related to actions and perceptions regularly involved in language use may contribute to category-speciﬁc language processes.

2. Symbolic Connectionist Models Of Language

According to modular approaches to cognitive psychology, the mental language processor consists of quasi-autonomous subprocessors or modules, and language processing therefore is being considered the result of quasi-autonomous subprocesses. The sub-processes envisaged to be involved in language comprehension are, for example, input-feature analysis, letter or phoneme analysis, word form processing, and semantic analysis (Morton 1967). These processes are assumed to occur sequentially or in a cascaded manner. A similar but reverse cascade has been assumed for the putative subprocesses of language production which ﬁnally results in movements of the articulators or the (writing) hand (Garrett 1984).

In symbolic connectionist models, the subprocessors of modular models have been replaced by layers of neuron-like elements, the assumption being that individual artiﬁcial neurons represent acoustic or visual features, phonemes or graphemes, word forms, and word meanings (see, for example, McClellend and Rumelhart 1981). Symbolic connectionist networks have been applied with some success to the modeling of word production (Levelt et al. 1999) and word recognition (Norris et al. 2000), as well as other language phenomena, for example speech errors observed in normal speakers and language-impaired neurological patients (see, for example, Dell 1986).

A still open issue is whether information should only be allowed to ﬂow in one direction between the subcomponents of the postulated networks, from input to semantics in comprehension and from semantics to output in production. As an alternative, reciprocal connections and thus bidirectional information ﬂow has been proposed to take place between the layers of the networks.

3. Distributed Connectionist Models Of Language

The distributed network type most commonly used to model language resembles symbolic networks, because both network types are made up of layers of neurons and connections between layers. An important diﬀerence is the following: Symbolic networks include local representations—usually single artiﬁcial neurons— that represent elementary features of the input and output as well as more complex entities such as, letters, phonemes, and words. In contrast, distributed networks use activity vectors over large sets of neurons in a given layer to represent words and other more complex linguistic structures. The distributed net-works most commonly used in language simulations do not include direct connections between the neurons of one layer. The active neurons deﬁned by an activity vector are therefore not connected to each other and do not form a functionally coherent system. This distinguishes them from cell assemblies with strong internal links. However, as a consequence of associative learning, artiﬁcial neurons in diﬀerent layers may strengthen their connections to neurons in other layers indirectly linking the active neurons deﬁned by individual activity vectors.

The classical type of distributed network, the perceptron, consists of two layers of neurons. Each neuron in layer one is connected to each neuron in the second layer, though connection strengths (or weights) vary as a function of associative learning of pairs of input–output patterns. In the 1960s, Minski and Papert (1969) proved that perceptrons can learn to solve only a certain type of classiﬁcation problem called linearly separable. In the 1980s, Rumelhart and McClelland (1986a) showed that the addition of one ‘hidden’ layer of neural units between the input and output layers, and extension of the learning rule, allows for overcoming this limitation. The networks were now able to learn to solve more complex classiﬁcation problems and this led to a strong interest in three-layer neural architectures. Three-layer perceptrons were used with some success to model language: There are models that classify speech signals (e.g., Waibel et al. 1995); others that mimic important aspects of the infant’s learning of language speciﬁc information as described by elementary rules (e.g., Hare et al. 1995); simulations of the eﬀects of focal brain lesions on language functions (e.g. Plaut and Shallice 1993) and of the recovery of language functions after stroke.

For solving more complex problems posed by syntax, the network architecture had to be modiﬁed once again. Three-layer architectures do not allow for keeping information in memory for a longer time span, but this is necessary for assessing syntactic dependencies between temporally distant language units as, for example, between the ﬁrst and last words of a long sentence. To solve this problem, a ‘memory layer’ allowing for reverberation of activity and information storage has been added to the three-layer- architecture (Elman 1990). Such networks including an additional memory layer can be shown to be more powerful than three-layer perceptrons in storing serial order relationships. They are capable of learning subsets of syntactically complex sentence structures, for example aspects of so-called center-embedded constructions.

4. Current Trends

After introducing three types of connectionist models, this research paper will now highlight selected topics in connectionist research, where the three approaches oﬀer somewhat diﬀerent views and where the divergence in views has actually led to productive research.

Recent trends in connectionist research on language include the more detailed modeling of syntactic mechanisms and attempts at mimicking more and more properties of the actual neuronal substrate in the artiﬁcial models (Elman et al. 1996). Multidisciplinary research across the computational and neurosciences is necessary here. The strategy to copy the brain’s mechanisms into the artiﬁcial neural network may be particularly fruitful for implementing those higher cognitive functions that, if implemented in the bio-logical world, only arise from speciﬁc brain types. The brain’s structure is information that may be of relevance for neuronal modeling.

The modeling of rule-like verbal behavior is an illustrative example for successful multidisciplinary interaction in connectionist research on language. It is sometimes assumed that symbolic algorithms are necessary for explaining the behavior described by linguistic rules. For producing a past tense form of English, one would, accordingly, use an abstract rule such as the following addition rule scheme:

Present stem+Past suﬃx=Past tense form

In particular, an algorithm of this kind could model the concatenation of the verb stem ‘link’ and the past suﬃx ‘ed’ to yield the past tense form ‘linked,’ and, in general, it could be used to derive any other regular past form of English. However, it is diﬃcult to see how an irregular verb such as ‘think’ or ‘shrink’ could yield a past form based on a similar rule. In the extreme, one would need to assume rules for individual words to provide algorithms that generate, for example, ‘went’ out of ‘go.’ This would require stretching the rule concept, and linguists have therefore proposed that there are two distinct cognitive systems contributing to language processing, a symbolic system storing and applying rules and a second system storing relation-ships between irregular stems and past forms in an associative manner (Pinker 1997).

From the perspective of neural networks, however, one may ask whether two separate systems, for rules and exceptions, are actually necessary to handle regular and irregular inﬂection. Rumelhart and Mc-Clelland (1986b) showed that an elementary two-layer perceptron can store and retrieve important aspects of both past tense rules and exceptions. It can even produce errors typical for children who learn past tense formation, such as so-called overgeneralizations (e.g., ‘goed ’ instead of ‘went.’)

From a linguistic perspective, the two-layer model of past tense proposed by Rumelhart and McClelland has been criticized, for example because it does not appropriately model the fact that rule-conforming behavior is by far most likely to be generalized to novel forms. The past form of a newly introduced verb, such as ‘dif,’ will thus almost certainly receive an ‘ed’ ending if one intends to use it in the past tense (‘diﬀed.’) This is even so in languages where most verbs have irregular past forms and only a minority of the verbs conform to the rule. The rule is nevertheless used as the default and generalized to novel forms and even rare irregular items. This is a problem for a subset of connectionist models, because the strongest driving forces in associative networks are the most common patterns in the input.

However, there are distributed three-layer networks that solved the problem of default generalization surprisingly well (Hare et al. 1995). An important determinant is that rule-conforming input patterns are maximally dissimilar, while the members of an irregular class resemble each other. Consider the diﬀerent regular forms to watch, talk, and jump in contrast to the similar members of an irregular class to sing, ring, and sting. Because the regulars are so heterogeneous, they occupy a wide area in input space. The representation in input space of a novel word is thus most likely to be closest to those of one of the many diﬀerent regular forms, and this is one important reason why so many new items are treated as regular by the network. On the other hand, if a newly introduced item happens to strongly resemble many members of a regular class, for example the pseudo-word pling, it is, in many cases, treated as regular. These observations may lead one to redeﬁne one’s concept of regularity: A rule is not necessarily the pattern most frequently applied to existing forms, but it is always the pattern applied to the most heterogeneous set of linguistic entities. The heterogeneity of the regular classes may explain default generalization along with the great productivity of rules.

The simulation studies of the acquisition of past tense and other inﬂection types by young infants suggest that neural networks consisting of one single system of layers of artiﬁcial neurons provide a reason-able model of the underlying cognitive and brain processes. In this realm, the single system perspective appears equally powerful as an approach favoring two systems, one specializing in rule storage and the other in elementary associative patterns.

Neuroscientiﬁc data and theories have recently shed new light on the issue of a single-system versus a double-system account of rule-like behavior. Important was the discovery of patients with brain lesions who were diﬀerentially impaired in processing regular and irregular past tense forms. Patients suﬀering from Parkinson’s disease or Broca’s aphasia were found to have more diﬃculty processing regulars, whereas patients with global deterioration of cortical functions as seen, for example, in Alzheimer’s Disease or Semantic Dementia showed impaired processing of irregulars (Ullman et al. 1997; Marslen-Wilson & Tyler, 1997). This double dissociation is diﬃcult to model using a single system of connected layers, but is easy to handle if diﬀerent neural systems are used to model regular and irregular inﬂection.

Another argument in favor of a double system account comes from neurobiological approaches pro-posing that words and inﬂectional aﬃxes are represented in the cortex as distributed cell assemblies. In this case, past tense formation can involve two types of connections, local within-area connections in the core language areas and long-distance links between the language areas and outside. It is known from neuroanatomy that two adjacent neurons are more likely to be linked through a local connection than are two distant neurons to be linked by way of a long-distance connection. This situation can be modeled by two pathways connecting the neuronal counterparts of present stems and past forms, for example a three-layer architecture with two pathways connecting input and output layers, one with higher and the other with lower connection probabilities between neurons in adjacent layers. Parameters are chosen appropriately, the two pathways or systems will diﬀerentially specialize in the storage of rules and irregular patterns. Similar to a two-layer perceptron, the low-probability system is best at storing the simple mapping between irregular present forms that resemble each other and their past forms. In contrast, the complex mapping between the heterogeneous regular stems and their past forms is best accomplished by the three-layer component with high connection probabilities. When the two components are diﬀerentially lesioned, the network produces the double dissociation between regular and irregular inﬂection seen in neuropsycho-logical patients. This approach explains the neuropsychological double dissociation along with aspects of the acquisition of past tense formation by young infants (Pulvermuller 1998). This explanation is based on principles of cortical connectivity.

Together, the neuropsychological double dissociation and the neurobiological consideration argue in favor of a two-system model of regular and irregular inﬂection. In contrast to the modular proposal that each of two systems are exclusively concerned with regular and irregular processes, respectively, the neuroscientiﬁc variant would suggest a gradual specialization caused by diﬀerential connection probabilities. The ongoing debate between cognitive neuroscientists favoring single-or double-system accounts of rule-like knowledge clearly proves the importance of multidisciplinary interaction between the linguistic, cognitive, computational, and neurosciences.

Bibliography:

Caplan D 1987 Neurolinguistics and Linguistic Aphasiology. An Introduction. Cambridge University Press, Cambridge, MA
Dell G S 1986 A spreading-activation theory of retrieval in sentence production. Psychological Review 93: 283–321
Elman J L 1990 Finding structure in time. Cognitive Science 14: 179–211
Elman J L, Bates L, Johnson M, Karmiloﬀ-Smith A, Parisi D, Plunkett K 1996 Rethinking Innateness. A Connectionist Perspective on Development. MIT Press, Cambridge, MA
Garrett M 1984 The organization of processing structures for language production. In: Caplan D, Lecours R, Smith A (eds.) Biological Perspectives On Language. MIT Press, Cambridge, MA, pp. 172–93
Hare M, Elman J L, Daugherty K G 1995 Default generalisation in connectionist networks. Language and Cognitive Processes 10: 601–30
Levelt W J M, Roelofs A, Meyer A 1999 A theory of lexical access in speech production. Behavioral and Brain Sciences 22: 1–75
Lichtheim L 1885 Uber Aphasie. Deutsches Archi fur Klinische Medicin 36: 204–68
Marslen-Wilson W, Tyler L 1997 Dissociating types of mental computation. Nature 387: 592–4
McClelland J L, Rumelhart D E 1981 An interactive activation model of context eﬀects in letter perception: part 1. Psycho-logical Review 88: 375–407
Minsky M, Papert S 1969 Perceptrons. MIT Press, Cambridge, MA Morton J 1969 The interaction of information in word recognition. Psychological Review 76: 165–78
Norris D J M, McQueen J M, Cutler A 2000 Merging in-formation in speech recognition: Feedback is never necessary. Behavioral and Brain Sciences 23: 299–370
Pinker S 1997 Words and rules in the human brain. Nature 387: 547–8
Plaut D C, Shallice T 1993 Deep dyslexia: a case study of connectionist neuropsychology. Cognitive Neuropsychology 10: 377–500
Pulvermuller F 1998 On the matter of rules. Network: Computation in Neural Systems 9: R 1–51
Pulvermuller F 1999 Words in the brain’s language. Behavioral and Brain Sciences 22: 253–336
Rumelhart D E, McClelland J L (eds.) 1986a Parallel Distributed Processing: Explorations in the Microstructure of Cognition. MIT Press, Cambridge, MA
Rumelhart D E, McClelland J L 1986b On learning the past tense of English verbs. In: McClelland J L, Rumelhart D E (eds.) Parallel Distributed Processing: Explorations in the Microstructure of Cognition. MIT Press, Cambridge, MA
Ullman M, Corkin S, Coppola M, Hickok G, Growdon J, Koroshetz W, Pinker S 1997 A neural dissociation within language: evidence that the mental dictionary is part of declarative memory, and that grammatical rules are processed by the procedural system. Journal of Cognitive Neuroscience 9: 266–76
Waibel A, Hanazawa T, Hinton G, Shikano K, Lang K J 1995 Phoneme recognition using time-delay neural networks. In: Chauvin Y, Rumelhart D E (eds.) Backpropagation: Theory, Architectures, and Applications. Developments in Connectionist Theory. Lawrence Erlbaum, Hillsdale NJ
Wernicke C 1874 Der aphasische Symptomencomplex. Eine psychologische Studie auf anatomischer Basis. Kohn und Weigert, Breslau, Poland