View sample Cognitive Psychology of Word Recognition Research Paper. Browse other research paper examples and check the list of research paper topics for more inspiration. If you need a religion research paper written according to all the academic standards, you can always turn to our experienced writers for help. This is how your paper can get an A! Feel free to contact our custom writing service for professional assistance. We offer high-quality assignments for reasonable rates.
The term ‘word recognition’ has two senses. Cognitive psychologists use it to refer to a component process of mind that transforms the printed or spoken features of a word into a linguistic representation. Cognitive psychologists must infer component processes from measures of behavior. Word recognition is a measured task performance. Laboratory volunteers perform a task in which they discriminate words from nonwords (the lexical decision task), and measurements of their performance are used to test hypotheses about the nature of word recognition. At one time, performance in the lexical decision task was assumed to provide an unambiguous view of the word-recognition component in operation. For example, ‘there is no way to perform this task without [word recognition], we can estimate the time for this operation by measuring the subject’s reaction time to make the decision’ (Forster 1976, p. 260). As a performance phenomenon, word recognition may or may not imply a distinct component process. Nevertheless, it is intuitive that reading and listening include a component process of word recognition, as the examples that follow may illustrate.
1. Component Processes And The General Linear Model
1.1 Intuitive Word Recognition
Nineteenth century eye movement studies discovered that a reader’s eyes make a series of jumps, while moving across a line of text, pausing about a quarter of a second when ﬁxated. Eye movements thus implied the ‘ﬁxation’ (recognition) of individual printed words in natural reading. A skilled reader can recognize thousands of printed words with no noticeable effort. A dyslexic child, on the other hand, does not easily acquire this skill. Recognizing a printed word, as a particular word, can be effortful to the point of frustration. Dyslexia may plague an otherwise bright and articulate child, which appears to implicate word recognition as the crux of reading. Reading is not exclusively the recognition of printed words, but it is word recognition that distinguishes reading from natural language. In effect, to become a reader is to master this special skill.
A different example illustrates word recognition in spoken language. When listening to speech in an unfamiliar language, people inevitably wonder why speakers ‘talk so fast.’ All languages are spoken at equivalent rates, but in a familiar language, we perceive breaks between words (although no physical breaks exist), which ‘slows down’ perception. Unfamiliar languages sound fast because the listener lacks the ability to segment continuous speech and to recognize the words.
In the examples of dyslexia and foreign speech, our intuition that a component of word recognition may exist derives from the absence of competent word recognition. However, the absence of a hypothetical competence tells us little or nothing about whether the hypothesis is valid. This requires scientiﬁc investigation, and a logic for the analysis of complex behavior.
1.2 Nearly Decomposable Systems
Herbert Simon proposed a logic of analysis for complex systems, such as cognitive systems. Complex systems can be partly decomposed, if the components interact linearly. The internal dynamics of cognitive components may be complex (nonlinear). If the interactions between components are, nevertheless, linear, then it is possible to identify these components. Cognitive components, described thus, work as a chain of single causes. Single causes entail the familiar notion of domino causality. Push the ﬁrst domino in a chain of standing dominoes and each will fall in its turn. Linear interactions, between the dominoes, add up to the total behavioral effect. If behavior is the sum of its parts, the total effect can be reduced to component causes. This formal logic justiﬁes behavioral studies to infer underlying cognitive components.
1.3 Additive Factors Method
The most well-known methodological tool to individuate cognitive components is the additive factors method, proposed by Saul Sternberg. Experiments that include several experimental conditions (i.e., factorial designs) allow simultaneous manipulation of several factors, which provides the opportunity for interaction. If the effects of two or more manipulations are strictly additive, the manipulated variables satisfy the assumption of selective inﬂuence—they selectively inﬂuence distinct components. For example, the total time from ﬁrst to last domino, in the previous, idealized chain of falling dominoes, is the sum of each domino’s falling time. In this idealization, separate experimental manipulations that slow falling times of individual dominoes, but do not change the falling times of dominoes that precede or follow an affected domino, satisfy the assumption of selective inﬂuence. A manipulation selectively effects a single domino’s falling time, without affecting other dominoes’ falling times. Such effects simply add time to the total falling time of the entire chain. Alternatively, when nonadditive interactions are observed, manipulations have not satisﬁed the assumption of selective inﬂuence. Factors that are not additive, inﬂuence (at least) one common component. Thus, to Sternberg’s lasting credit, his method includes an empirical failure point: ubiquitous nonadditive interaction effects.
2. Word Recognition As Information Processing
The logic of the additive factors method also ﬁts the metaphor of cognition as information processing. For the chain of dominoes, substitute the guiding analogy of a ﬂow chart of information processing, like unidirectional ﬂow charts of computer programs. Information ﬂows from input (stimulus) to output (behavior) through a sequence of cognitive components. In word recognition, input representations from a sensory process—visual or auditory features of a word—are transformed into an output representation—the identity of the word—that, in turn, becomes the input representation for a component downstream (i.e., a component of response production or sentence processing). In this tradition, empirical studies of word recognition pertain to the structure and function of the lexicon. The lexicon is a memory component, a mental dictionary, containing representations of the meanings, spellings, pronunciations, and syntactic functions of words. Structure refers to how the word entries are organized, and function refers to how words are accessed in, or retrieved from, the lexicon. Two seminal ﬁndings illustrate the distinction: semantic priming effects and word frequency effects. Both effects are found in lexical decision performance.
2.1 The Lexical Decision Task
In the lexical decision task, a person is presented, on each trial, with a target string of letters, and must judge whether the target string is a correctly spelled word in English (or some other reference language). Some trials are catch trials, which present nonwords such as ‘glurp.’ (One may also present words and nonwords auditorally, to examine spoken word recognition.) The participant presses a ‘word’ key to indicate a word and a ‘nonword’ key otherwise. The experimenter takes note of the response time, from when the target stimulus appeared until the response key is pressed, and whether the response was correct. Response time and accuracy are the performance measures.
2.2 Semantic Priming And The Structure Of The Lexicon
Word pairs with related meanings, such as ‘doctor’ and ‘nurse’ or ‘bread’ and ‘butter,’ produce semantic priming effects. Semantic priming was discovered by David Meyer and Roger Schvaneveldt, working independently (they chose to report their ﬁndings together). Lexical decision performance to a word is improved by prior presentation of its semantically related word. Prior recognition of ‘doctor,’ as a word, facilitates subsequent recognition of ‘nurse’; lexical decisions to ‘nurse’ are faster and more accurate, compared with a control condition. This ﬁnding is commonly interpreted to mean that semantically related words are structurally connected in the lexicon, such that retrieval of one inevitably leads to retrieval of the other (in part or in whole).
2.3 Word Frequency And The Function Of Lexical Access
Word frequency is estimated using frequency counts. The occurrence of each word, per million, is counted in large samples of text. Lexical decision performance is correlated with word frequency. Words that occur more often in text (or in speech) are recognized faster and more accurately than words that occur infrequently. This ﬁnding is interpreted in a variety of ways. The common theme is that lexical access functions in a manner that favors high-frequency words. In one classical account, proposed by John Morton, access to a lexical entry is via a threshold. Word features may sufficiently activate a lexical entry, to cross its activation threshold, and thus make that entry available. Common, high-frequency words have lower threshold values than less common words. In a different classical account, proposed by Kenneth Forster, the lexicon is searched in order of word frequency, beginning with high frequency words.
2.4 Challenges To The Information Processing Approach
Additive interaction effects are almost never observed in word recognition experiments, and, while it is not possible to manipulate all word factors simultaneously in one experiment, it is possible to trace chains of nonadditive interactions across published experiments that preclude the assignment of any factors to distinct components. Moreover, all empirical phenomena of word recognition appear to be conditioned by task, task demands, and even the reference language, as the examples that follow illustrate.
The same set of words, which produce a large word frequency effect in the lexical decision task, produce a reduced or statistically unreliable word frequency effect in naming and semantic categorization tasks. All these tasks would seem to include word recognition, but they do not yield the same word recognition effects. Also, within the lexical decision task, itself, it is possible to modulate the word frequency effect by making the nonwords more or less word-like (and, in turn, to modulate a nonadditive interaction effect between word frequency and semantic priming). Across languages, Hebrew produces a larger word frequency (familiarity) effect than English, and English than Serbo–Croation.
Consider the previous examples together, within the guidelines of additive factors logic. Word recognition factors cannot be individuated from each other, and they cannot be individuated from the context of their occurrence (task, task demands, and language). The limitations of additive factors method are well known. Because additivity is never consistently observed, we have no empirical basis for individualizing cognitive components. The de facto practice in cognitive psychology is to assume that laboratory tasks and manipulations may differ from each other by the causal equivalent of one component (‘one domino’). But how does one know which tasks or manipulations differ by exactly one component? We require a priori knowledge of cognitive components, and which components are involved in which laboratory tasks, to know reliably which or how many components task conditions entail. Notice this circularity, pointed out by Robert Pachella: the goal is to discover cognitive components in observed laboratory performance, but the method requires prior knowledge of the self same components.
Despite these problems, most theorists share the intuition that a hypothetical component of word recognition exists. When intuitions diverge, however, there may be no way to reconcile differences. Theorists who assume that reading is primarily an act of visual perception discover a visual component of word recognition; theorists who assume that reading is primarily a linguistic process discover a linguistic component of word recognition, in the same performance phenomena. Repeated contradictory discoveries, in the empirical literature, have lead to a vast debate concerning which task conditions provide an unambiguous view of word recognition in operation. The debate hinges on exclusionary criteria that may correctly exclude task effects and bring word recognition into clearer focus. Otherwise, inevitably, one laboratory’s word recognition effect is another laboratory’s task artifact.
Context effects seem to occur at all scales at which words may be viewed. For example, just as semantically related prime words (and appropriate sentence contexts) produce beneﬁts for word recognition; words themselves, as contexts, affect letter and phoneme identiﬁcation. A brieﬂy presented letter is more accurately identiﬁed if it is presented within a word than the same letter presented in a nonword, or presented alone. Also, an ambiguous initial consonant, which could be either a /d/ or a /t/, is more likely to be identiﬁed as /d/ in the context of /_ash/ (where /dash/ is a word and /tash/ is not), and as /t/ in the context of /_ask/ (where /task/ is a word and /dask/ is not).
3.1 Interactive-Activation Models
James McClelland and his colleagues constructed interactive activation models to simulate the previous context effects. The original interactive activation model simulated context effects on letter identiﬁcation, and was extended in the another model to simulate context effects on phoneme identiﬁcation. Interactive activation models are connectionist models. In a connectionist model, constraints on response options are implemented as excitatory and inhibitory connections among nodes that behave as idealized neurons. The original interactive activation model included a hierarchy of representations, implemented in three node levels: visual feature nodes, letter nodes, and word nodes. Most important, letter nodes and word nodes have reciprocal excitatory connections, which allow feedback from word representations to excite letter representations. As a consequence, letter nodes that are inadequately activated by feature nodes may beneﬁt from word feedback. This is the hypothetical basis of word context effects on letter (and phoneme) identiﬁcation.
3.2 PDP Models
In interactive activation models, the connection strengths between nodes are preset by the modeler. Parallel distributed processing (PDP) models include learning algorithms capable of covariant learning. Covariant learning shapes a matrix of connection weights to reﬂect statistical relations between the input and output patterns of a training set. Systematic relations, at a variety of grain sizes, between English spelling and phonology (or phonology and semantics, etc.), may all be construed as statistical relations. For example, consonant spellings are more strongly correlated with consonant pronunciations than are vowel spellings with vowel pronunciations, but in both cases there are statistically dominant and subordinate relations. Regular words, with dominant relations, are named more quickly than exception words, with subordinate relations. Likewise, body-rime relations may be consistently correlated (the body _uck always indicates the same rime, in English, as in the word ‘duck’); but other body-rime relations are less strongly correlated (_int pronounced as in ‘mint’); and still others are only weakly correlated (_int pronounced as in ‘pint’)—a rank order that is also corroborated by readers’ naming times. The emphasis of PDP models, on learning the relation between spelling and phonology, coincides with the commonly observed failure of dyslexics to sound out words, and pronounceable pseudowords such as ‘glurp.’ Not all scientists agree, but the most common form of dyslexia appears to be a speciﬁc failure to learn the ﬁne-grain consonant and vowel relations between spelling and phonology.
Covariant learning tracks all grain sizes of covariation, simultaneously, in the connection weights of a PDP model. Even word frequency approximates a relative correlation among whole-word spellings and whole-word pronunciations, and high-frequency words are named faster than low-frequency words. The outcome of covariant learning is determined by statistical relations in the training set, and so a description of the training set is an integral part of the theory. In effect, covariant learning attunes a network to constraints inherent in a literate culture, for example, the relation between spelling and phonology. Consequently, the description of the cultural artifact, implicit in the training set, is as important for cognitive theory as hypothetical, internal, cognitive processes. Debra Jared used this aspect of PDP models to derive a nonintuitive empirical test using a naming study. The aggregate statistical relation of spelling to phonology, in body-rimes of high-frequency words, predicted a statistical advantage for some high-frequency words over others. Previously, word recognition of higher frequency words was assumed to be immune to such statistical relations. Nevertheless, Jared corroborated the prediction: faster naming times to high-frequency words with consistent body-rime relations. She did not observe this effect in the lexical decision task, however.
3.3 Feedback Consistency Effects
Body-rime consistency effects, in lexical decision performance, were predicted using a combination of interactive activation and covariant learning. Resonance models include learning algorithms that induce symmetrical covariant relations, consistent in both feed forward and feedback directions. Consequently, they predict that symmetrical, consistent relations, between spelling and phonology (and phonology and spelling), imply faster and more accurate word recognition. For example, the body-rime (and rime-body) relation in ‘duck’ is a symmetrical, consistent relation. All words with the body _uck are pronounced to rhyme with ‘duck’; all words with the rime /_uk/ are spelled with the body _uck, as in ‘duck.’ Inconsistent relations, including inconsistent feedback relations, from phonology to spelling, add time and variability to performance of word recognition. The predicted feedback consistency effect is highly nonintuitive. From an information processing view, processes should always ﬂow forwards, as from spelling to phonology. It should not matter in visual word recognition that a pronunciation may have more than one spelling (or in spoken word recognition that a spelling may have more than one pronunciation).
Feedback consistency effects have been found in performance of English and French lexical decision by Greg Stone, Johannez Ziegler and their colleagues. Words such as hurt (in English), with phonological rimes ( /_urt/ ) which could be spelled in multiple ways (_urt, _ert, _irt) yield slower lexical decision times and more errors than words with rimes spelled in only one way. Also, a symmetrical feedback consistency effect is found in performance of auditory lexical decision (in English and French). What is feed forward for visual presentation is feedback for auditory presentation, and vice versa—a parsimonious qualitative symmetry. Moreover, once feedback consistency is taken into account, reliable feed forward consistency effects emerge in visual and auditory lexical decision performance, effects previously thought to be unreliable.
4. Recurrent Network Models And Nonlinear Dynamical Systems Theory
Interactive activation models and PDP models were actually proposed as ﬁrst steps toward strongly nonlinear models that combined their features. This combination was realized in fully recurrent ‘neural’ networks (resonance models). Recurrent networks are attractor networks simulated as nonlinear iterative maps. An iterative map takes its output at one time-step as input on the next time-step, until it reaches a stable pattern of node activity (an attractor pattern).
Nonlinear iterative maps approximate solutions of systems of nonlinear differential equations. Thus, recurrent network models, as dynamical systems, invoke the mathematical framework of nonlinear dynamical systems theory. A few recurrent network models of word recognition have been implemented, notably by Alan Kawamoto (and colleagues), Michael Masson, and Stephen Grossberg (and colleagues).
Nonlinear dynamical systems theory concerns the complex behavior of systems, produced by components that interact nonlinearly (nonadditively). A few pioneering studies have corroborated empirical signatures of nonlinearity, including multistability in performance of printed word naming, hysteresis in performance of spoken word identiﬁcation, and 1 f noise in lexical decision.
Multistability is a generic phenomenon of nonlinear dynamical systems. In this case, the same stimulus word reliably produces more than one naming response. Homograph words are an obvious example. Homographs, such as ‘wind,’ have two or more pronunciations, and are thus multistable by deﬁnition. A word like ‘pint,’ with a statistically subordinate pronunciation (consider ‘hint,’ ‘lint,’ and ‘mint’), is a more subtle example. Readers produce two systematic pronunciations to ‘pint’: the correct pronunciation and an error pronunciation that rhymes with ‘mint.’ Moreover, when readers are encouraged to produce rapid pronunciations, they are much more likely to produce the systematic mispronunciations. In the latter manipulation, a quantitative change in the time available for a naming response produces a qualitative change in the response itself. Qualitative changes consequent on quantitative manipulations are precisely the kind of phenomena that are addressed by nonlinear dynamical systems theory.
Hysteresis effects are more elaborate signatures of nonlinearity, which extend the concept of multistability. Betty Tullerand her colleagues have demonstrated hysteresis effects in perception of artiﬁcial speech. As in the previous example, of homograph words, an identical pattern of artiﬁcial speech may yield multiple (multistable) perceptions. A typical experiment manipulates the presentation order of speech stimuli. Stimuli are constructed to change quantitatively and incrementally in acoustic properties along a continuum, between the words ‘say’ and ‘stay’, for example. Each run of an experiment presents the continuum, one stimulus at a time, running from ‘say’ to ‘stay’, and back again (or vice versa). Hysteresis is observed when some intermediate range of stimuli is perceived as ‘say’—if preceded by ‘say’ stimuli— but the identical range is perceived as ‘stay’ if preceded by ‘stay’ stimuli. This intermediate range is multistable; and the context effect demonstrates hysteresis.
The hysteresis pattern is a generic pattern that is observed widely in physical, chemical, biological, and cognitive systems. Prior to mathematical developments in this century, however, it was considered a nuisance effect. As a nuisance effect, it motivated one of the ﬁrst methods in psychology—Fechner’s method of limits in nineteenth century psychophysics— effectively, a technique to get around the problem of hysteresis. Generic constructs such as hysteresis and multistability provide an analogy to understand ambiguity resolution in natural language. Words are pronounced differently, and have different meanings, in different contexts. Context sensitivity is a deﬁning feature of multistable phenomena (as hysteresis demonstrates), and the mathematical framework of nonlinear systems yields a simplifying formal perspective on context sensitivity in natural systems.
4.3 1/f Noise
David Gilden and his colleagues have observed 1/f noise in the ﬂuctuation of trial-by-trial response times from several cognitive tasks, including a word recognition task. 1/f noise is observed in the residual ‘error’ variance of individual participants’ trial-bytrial response times (the variability that remains after ‘treatment effects’ are removed). If we graph each residual time, in the trial order of the experiment, the data points ﬂuctuate between fast and slow times. The connected data points form a complex waveform, which may be viewed as a composite of waves spanning a range of frequencies. 1/f noise is an inverse correlation between the frequency of the composite waves and their power (amplitude).
The phenomenon of 1/f noise can be difficult to grasp, because it goes against the grain of a typical psychological analysis. Typically, error variance is discarded, rather than analyzed for structure. In Gilden’s data, the error variance is analyzed and found to resemble the mathematically generic pattern of 1/f noise (a construct from fractal geometry). 1/f noise is a signature of processes that have no characteristic measurement scale. It contradicts additive factors logic, which assumes that we may partition response times into additive, independent sources of variance. This practice strictly requires that the response time in each trial is independent of the response times in other trials. The assumption of independence is at the heart of the linear statistical models used to identify cognitive components. The presence of 1/f noise contradicts this assumption. Response time data do not have ‘joints’ that may reduce to component causes.
4.4 Challenges For Connectionism
A laudable feature of the information processing approach was its explicit logic and method of analysis, derived from the general linear model, and implemented in additive factors method. No comparable logic has gained general acceptance within the connectionist approach (although connectionist models have been used productively in tests of nonintuitive predictions). In large part, connectionism has inherited the methodology of information processing psychology. However, empirical analyses that assume the general linear model, and theories implemented as strongly nonlinear dynamical systems, are incompatible at their root. They entail contradictory notions of cause and effect.
The domino causality of information processing analysis individuates single causes in additive effects— cause and effect relations that allow the morphological reduction of behavior to underlying component causes. In contrast, the circular causality of strongly nonlinear systems allows that all the components of a system may be present in qualitatively different behaviors of the system. Causal properties emerge in the interaction of components, which are not reducible to causal properties of the components themselves. Circular causality requires a strategic (not morphological) reduction. In a strategic reduction, the same, generic, nonlinear phenomena may be observed at multiple levels of a system. One must concede, however, that higher-level phenomena do not reduce to lower-level causes.
The previous contradiction between linear methods and nonlinear models raises questions, as yet unanswered, for the cognitive psychology of word recognition. For example, is it more productive, for scientiﬁc purposes, to view word recognition performance as a product of the nervous system (which also appears as a nonlinear dynamical system), a product of inscrutable components of mind (which have combined in nonlinear interaction), an emergent product of interaction between readers and texts, or some other possibility that is not articulated in the previous alternatives? Answers to such questions await a generally accepted and reliable logic of nonlinear analysis, appropriate to cognitive performance, connectionist models, and nonlinear dynamical systems theory.
- Carroll D W 1994 Psychology of Language 2nd Brooks Cole, Paciﬁc Grove, CA
- Farmer J D 1990 A Rosetta Stone for connectionism. Physica D 42: 153–87
- Forster K I 1976 Accessing the mental In: Wales R J, Walker E (eds.) New Approaches to Language Mechanisms. North-Holland, Amsterdam
- Frost R, Katz L (eds.) 1992 Orthography, Phonology, Morphology, and Meaning. North Holland, Amsterdam
- Plaut D C, McClelland J L, Seidenberg M S, Patterson K 1996 Understanding normal and impaired word reading: computational principles in quasi-regular domains. Psychological Review 103: 56–115
- Port R F, van Gelder T (eds.) 1995 Mind as Motion: Explorations in the Dynamics of Cognition. MIT Press, Cambridge, MA
- Rayner K, Pollatsek A 1989 The Psychology of Reading. Prentice Hall, Englewood Cliffs, NJ
- Van Orden G C, Stone G O, Pennington B F 1990 Word identiﬁcation in reading and the promise of subsymbolic psycholinguistics. Psychological Review 97: 488–522
- Van Orden G C, Pennington B F, Stone G O 2001 What do double dissociations prove? Cognitive Science 25