plunkett&marchman 1993

Download Plunkett&Marchman 1993

Post on 07-Aug-2018




0 download

Embed Size (px)


  • 8/20/2019 Plunkett&Marchman 1993


    gn n , 4 8 (1993) 1 69 10 77 93 06.00 0 1993 Elsevier Science Publishers B.V. All rights reserved.

    From rote learning to system building:acquiring verb morphology in children andconnectionist nets

    Kim Plunkett*D ep r m en f Ex p e r m en sy ch gy , Un v e r s y f xf rd , S u h rk s R d , xf rd X1 3 UD ,U K

    Virginia MarchmanD ep r m en f sy ch gy , 1 0 W . J hn s n S r ee , B r gd en , Un v er s y f W s c n s n , d s n ,

    W 3706 1611, U SA

    Received October 2, 1990, final version accepted April 16, 1993


    T h e r d n cc un f h e c q u s n f E n g s h v er b m rph gy s up p s es

    h du r ch ec ur e un d er es h e r n s n f r m e r y r e e rn n g pr ce ss es

    ( n h ch p s en s e f r m s f v er b s r e c rr ec y pr du ce d ) h e sys em c

    r e m en f v er b s ( n h ch rr eg u r v er b s r e pr n e e rr r ). A c nn ec n s

    cc un s up p s es h h s r n s n c n cc ur n s n g e m ec h n s m ( n h e f r m

    f n eur n e rk ) dr v en b y g r du q u n v e ch n ge s n h e s z e f h e

    r n n g s e h ch h e n e rk s e x p s ed . n h s p p er , s er es f s m u n s s

    r ep r ed n h ch m u y er ed p er ce p r n e rn s m p v er b s em s p s en s e

    f r m s n g u s h e m pp n gs f un d n h e E n g s h p s en s e sys em . B y

    ex p nd n g h e r n n g s e n g r du , n cr em en f s h n nd ev u n g n e rk

    p er f r m n ce n b h r n ed nd n v e v er b s s u cc e ss v e p n s n e rn n g, s

    d em n s r ed h h e n e rk un d er g es r e r g n z n s h r es u n s h f f r m

    m d e f r e e rn n g sys em c r e m en f v er b s . F ur h er m r e, e s h

    h h s r e r g n z n r n s n s d ep end en up n h e nu mb er f r eg u r nd

    rr eg u r v er b s n h e r n n g s e nd s s en s v e h e ph n g c s u b r eg u r es

    ch r c er z n g h e rr eg u r v er b s . T h e p ern f e rr r s b s er v ed s c m p r ed

    h f ch dr en cq u r n g h e E n g s h p s en s e, s e s ch dr en ’s p er f r m n ce

    n ex p er m en s ud es h n n s en s e v er b s . s c n c ud ed h c nn ec n s

    pp r ch ff er s v b e e rn v e cc un f h e cq u s n f E n g s h v er b

    *Corresponding author.

  • 8/20/2019 Plunkett&Marchman 1993


  • 8/20/2019 Plunkett&Marchman 1993


    K . un k e , V . r ch m n g n n 4 8 (1993) 1 69 3

    underlying children’s ultimate knowledge of the English inflectional morphologi-

    cal system. One mechanism controls the default application of a general rule,

    responsible for the generativity of the regular paradigm in a given inflectional

    system. A separate mechanism identifies exceptions and prompts the child to

    consult its knowledge store when producing and comprehending past tense forms.

    The most recent incarnation of this dual-mechanism hypothesis explains the

    phenomenon of overgeneralization using the “blocking principle”, that is, it is

    when the knowledge store for irregular verbs fails to block the application of the

    regular rule that overregularizations are assumed to occur (Marcus et al., 1992).

    With repeated exposure, the strength of the lexical entry for the irregular verb

    increases and the tendency for overgeneralization errors correspondingly de-

    creases. The retrieval of irregular lexical items is assumed to involve mechanisms

    which derive from similar memory-based associative processes that guide thecorrect early usage of past tense verb forms.

    In summary, the onset of the production of erroneous overregularized forms isgenerally attributed to the transition from a stage in which learning primarily

    involves expanding the store of individual lexical items, that is, rote learning, to a

    stage of rule construction and refinement-system budding. Fleshing out this

    explanation and providing an adequate interpretation of the phenomenon of

    overregularization and U-shaped development requires an account of (at least)

    the following:

    (1) The factor(s) that trigger the transition from rote learning to system building.

    (2) The basis for determining when and how individual lexical items aresusceptible (or resistant) to overgeneralization errors.

    (3) The mechanism(s) by which overgeneralization errors are eventually elimi-

    nated and appropriate performance is ultimately achieved.

    Clearly, the fact that children are able to utilize productively systems of

    inflectional morphology (not to mention other aspects of syntax and semantics) is

    of considerable theoretical interest. However, as discussed above, most of the

    evidence suggests that the ability to do so is not evident from the beginning of

    acquisition, emerging after at least some lexical acquisition has taken place. The

    factors that trigger this transition in the child have not yet been clearly identified;

    however, a requisite amount of linguistic experience is often assumed (Karmiloff-

    Smith, 1986). For example, the onset of usage of the English past tense regular

    rule is typically thought to depend upon the learning of a sufficient number of

    suffixed past tense forms. Clearly, without such exposure, it would be difficult for

    systematicities which define the regular rule to be extracted. However, note that

    sufficient exposure to non-rule-governed irregular forms is also required in order

    for appropriate blocking to occur.

    It has also been proposed that rule-based processes emerge relatively in-

  • 8/20/2019 Plunkett&Marchman 1993


    24 K. Plunkett , V. Marchman I Cognition 48 1993) 21-69

    dependently of lexical development. For example, the maturation of an inflection-

    al system-building device might also determine the timing of the onset of a

    U-shaped profile of development (Bever, 1982; Pinker, 1991), in particular, one

    that is associated with the onset of the obligatory marking of tense (Marcus et al.,

    1992). Yet note that if maturational factors are found to play a role, it is likely

    that they interact with input factors to some extent in order to account for

    observed time lags in the onset of productive behavior in different linguistic

    domains, for example, the relatively early acquisition of the English plural system

    and the typical late acquisition of the past tense system (Brown, 1973; de Villiers

    & de Villiers, 1985). Other explanations of the time lag in the acquisition of

    inflectional systems incorporate children’s developing conceptual understanding of

    time and number (e.g., Carey, 1982), as well as the character (e.g., transparency

    of form-function mappings) of the inflectional system in the language to beacquired (e.g., Johnston & Slobin, 1979; Slobin, 1985).

    In interpreting the U-shaped developmental pattern, Plunkett and Marchman

    (1991) argued that it is important to distinguish between m cr and micropatterns of errors when characterizing children’s acquisition of inflectional systems

    like the English past tense. Macro U-shaped development refers to a rapid and

    sudden transition into the second phase of system building, resulting in the

    indiscriminate application of the “add/-ed/” rule to whole classes or categories of

    verbs. In contrast, a micro U-shaped developmental pattern is characterized by

    selective suffixation of English irregular verbs, and results in a period of

    development in which some irregular verbs are treated as though they belong to

    the regular paradigm while others are still produced correctly. The basis forselective application of the suffix may be defined with respect to certain

    representational characteristics of the verb stem (phonological, semantic or

    otherwise), or may result from the operation of a probabilistic device which

    determines the likelihood that the suffix will be applied to a given irregular verb.

    While a macro view of overgeneralization phenomena has achieved textbook

    status, there appears to be little empirical evidence that children overgeneralizethe /-cd/suffix indiscriminately, that is, to all irregular verbs in their current

    vocabularies (e.g., Maratsos, 1983). Nor is there evidence to suggest the existence

    of a single well-defined stage of development in which erroneous behavior is

    observed (see also Derwing & Baker, 1986). Rather, children are likely to

    overgeneralize the suffix to only some irregular verbs (typically a small number),

    while at the same time, correct irregular past tense verb forms are also produced.Furthermore, errors may occur across a protracted period, with some irregular

    verbs recovering from erroneous treatment only to be overgeneralized again at alater point in development. Findings undermining the regular rule “imperialism”

    hypothesis also derive from studies of naturalistic past tense usage (e.g., Marcus

    et al., 1992), as well as those using elicitation procedures (Marchman, 1988;

    Marchman & Plunkett, 1991).

  • 8/20/2019 Plunkett&Marchman 1993


    . nk e , V. c hm n o gn i io n 48 1993) 21-69 25

    In addition, both children and adults sometimes produce irregulurization errors(e.g., flow_tJlew), in which stem-past tense mappings reflect the sub-regularitiescharacteristic of identity mapping verbs (no change from the stem to the pasttense form) and vowel change verbs (Bybee & Slobin, 1982; Marchman, 1988;Marchman & Plunkett, 1991; Plunkett & Marchman 1991). These occur lessfrequently than the standard “add/-ed/” error, and are more likely to occur inolder children and adults. Further, these errors typically (although not always)occur with verb stems that share phonological features of these irregular classes.While most studies report a fairly low frequency of past tense errors in general,errors of both sorts are nevertheless quite pervasive across individuals, asregularized and irregularized forms are observed across extended periods ofdevelopment in an overwhelming majority of subjects.

    In general, then, it is fairly well established that the erroneous productions ofchildren learning the English past tense reflect micro-level processes. U-shapeddevelopment operates on a stem by stem basis and is characterized by theselective, gradual, and protracted onset and recovery from erroneous production.Within a dual mechanism approach, it is the operation of the lexical retrievaldevice (rather than the rule mechanism) that gives U-shaped development itsmicro character. That is, the onset of rule usage is, by definition, an all-or-noneprocess applicable to any stem, regardless of phonological shape and frequency.Overregularization errors are the result of inappropriate rule application when-ever the lexical-based mechanism f s successfully to identify and retrieve anirregular item. The probability of an irregular past tense form being successfully

    retrieved is dependent upon a variety of factors, including frequency andphonological similarity, that serve to strengthen the status of irregular verbs overthe course of learning. In contrast to the absolute nature of the rule-basedmechanism, the micro and selective nature of development derives from theassociative processes embodied by the lexical retrieval mechanism.

    In contrast to the dual architecture assumptions characteristic of rule-basedaccounts, Rumelhart and McClelland (1986) (henceforth R&M) argued that as n g e m ec h n s m sys em in the form of a connectionist network is capable ofextracting a range of regularities that characterize the English past tense systemand producing patterns of overgeneralizations analogous to the errors observed inchildren. In their model, the transition from rote learning to system buildingemerges from the capacity of connectionist networks to s m u n e u s y

    (1) e m r z e individual patterns and their transformations when the number ofpattern types is sufficiently small.

    (2) G en er z e on the basis of regularities observed in the input when the numberof patterns (types) is sufficiently large.

    R&M initially trained their network on a subset of the vocabulary to which it

  • 8/20/2019 Plunkett&Marchman 1993


    26 K. Plunkett, I/ Marchman I Cognition 48 (1993) 21-69

    would eventually be exposed. During the first 10 epochs of training only 10 verbs(8 of which were irregular) were presented to the network. Given the learningand representational resources of their network architecture (a single-layeredperceptron), the model succeeded in learning the 10 verbs by rote, that is, withoutdiscovering any regularities among the individual verbs in the training set. After10 epochs of training, R&M increased the size of the training set by 410 verbs.Consistent with the frequency facts of English, most of these new verbs wereregular. Not surprisingly, this sudden expansion in vocabulary size caused thelearning algorithm (a probabilistic version of the perceptron convergence proce-dure (Rosenblatt, 1962)) to extract the “add/-ed/” regularity and to reorganizethe mapping characteristics of the network to reflect the dominant suffixationprocess. As a result, many irregular verbs displayed a sudden decrement in

    performance that was eventually overcome with continued training. It is highlylikely then that much of the success in modeling the classic U-shaped profilederived from the abrupt manipulation of the number and structure of mappingpatterns, and the corresponding transition from item memorization to generaliza-tion that is inherent in these networks under such circumstances.

    Several critiques (e.g., Pinker & Prince, 1988) noted that the discontinuitiesintroduced into the training regime by R&M do not reflect plausible discon-tinuities in the input to children. First, there is scant evidence for such an abruptincrease in the total number of verbs to which children are exposed. Second, theevidence from children’s productions (Brown, 1973) suggests that the relativeproportions of regular and irregular verbs are less skewed than those represented

    in the R&M training set. For example, Pinker and Prince (1988) note that duringearly phases of acquisition, regular and irregular verbs are approximately evenlyrepresented in children’s production vocabularies. In general, current consensushas targeted the implausibility of the abrupt changes in vocabulary size in theoriginal simulations. Hence the theoretical significance of the U-shaped learningdemonstrated by the single-mechanism R&M model has been undermined.

    More recently, Plunkett and Marchman (1991) demonstrated that severalcharacteristics of micro U-shaped development can emerge in an artificial neuralnetwork trained to map verb stems to past tense forms i n t h e b s e n ce of n y d i s co n t i nu i t i es i n t h e t r i n i n g r eg i m e. Here, the network was required to learn anentire set of 500 verbs concurrently. Rather than the result of abrupt changes inthe size of the input set, the patterns of errors observed in the Plunkett andMarchman (1991) simulations was shown to derive purely from the co m p e t i t i o n between the different types of mappings used in the simulations. In particular,overgeneralizations of suffixes to irregular stems resulted from the network’sattempt to simultaneously fulfill the constraints of regular and irregular mappingswithin the confines of a single learning mechanism. This work also showed thatthe capacity of these types of networks to learn inflectional verb morphology ishighly sensitive to input parameters such as the type and token frequency of stems

  • 8/20/2019 Plunkett&Marchman 1993


    K . un k e , V . r ch m n g n n 4 8 (1993) 1 69 7

    in the input set, as well as the degree to which the phonological shape of the stem

    is a predictor of mapping pattern.

    Importantly, the errors observed in the Plunkett and Marchman (1991)simulations were predictable in terms of the input factors, frequently bearing an

    uncanny resemblance to those well documented in the child language literature.However, the overall size of the vocabulary precluded the network from achieving

    complete mastery of the vocabulary early in training. Thus, the marked transition

    from an initial overall high performance to a performance decrement that was

    achieved in the original R&M model was not observed. Although it was

    important to demonstrate that competition between mapping types can result in

    overgeneralization errors in the absence of discontinuity in the input, it is

    nevertheless unlikely that children attempt to learn an entire lexicon all of a

    piece. Naturalistic (e.g., Dromi, 1987) as well as parental report measures (Bates

    et al., 1992; Marchman & Bates, in press) suggest that verb acquisition in children

    is a gradual process which follows an i n c r em e n t l learning trajectory.

    In this paper, we examine the effects of an incremental training regime on

    networks’ ability to learn mappings analogous to those comprising the past tense

    system of English. (See Elman, 1991, for an application of incremental training to

    the acquisition of simple and complex syntactic forms.) The general training plan

    is as follows. Early on, the network is exposed to a small number of high-

    frequency stems. In this respect, the simulation resembles the early stage of

    training in the R&M model. Subsequent to this initial phase, and in contrast to

    the R&M model, the size of vocabulary is incremented gradually, one lexical item

    at a time. The selection procedure for adding a particular stem to the training setensures that medium-frequency stems are chosen during the middle epochs, while

    lower-frequency stems are added during the later stages of vocabulary growth.

    The goal of the incremental training regime is to determine whether a

    continuous, linear growth in verb vocabulary is adequate to t r i gg e r a representa-

    tional reorganization in the network, that is, the transition from a purely

    rote-learning device to a system that can capture the regularities of verb

    morphology as well as its exceptions. To this end, network performance is

    evaluated both with respect to the training set and a set of novel stems.

    Performance on novel verb stems is particularly important since it reflects the

    manner in which the network represents the problem domain. The systematicsuffixation of novel stems would indicate, for example, that the network hasabstracted a generalization beyond a simple memorization of the training set.

    We evaluate systematically several parameters associated with incremental

    training regimes. First, we compare two different incremental training schedules:

    c r i t e r i 1 versus ep oc h -b u s e d expansion. In criteria1 expansion, new verbs are

    added to the training set only when all previous stem to past tense mappings have

    been mastered by the network. In epoch-based expansion, new verbs are added to

    the training set after a given amount of training, irrespective of performance on

  • 8/20/2019 Plunkett&Marchman 1993


    28 K. Plunkett, V. Marchman I Cognition 48 (1993) 21-69

    verbs in the training set. Second, we explore the role of final vocabulary size indetermining the degree of generalization to novel stems. There is a naturalconfound between size of vocabulary and length of training when an incrementalprocedure is used. Thus, in order to tease these factors apart, we conduct a seriesof simulations in which vocabulary expansion is halted and training is continued inthe absence of vocabulary growth. Third, we evaluate the structure of the trainingset in determining network performance. It is informative to ascertain the degreeto which exceptions to regularities in the training set can block generalization.Performance is evaluated in networks that are trained on early vocabularies thatrange from consisting exclusively of irregular forms to consisting exclusively ofregular forms. Finally, we manipulate the processing resources available to thenetwork by varying the number of hidden units.’ It is generally assumed within theconnectionist literature (Rumelhart, Hinton, & Williams, 1986; Hinton, 1989)that increasing the number of hidden units in a network will assist the network inrepresenting additional details of the input and target vectors. On the other hand,decreasing the number of hidden units forces the abstraction of any regularities inthe mapping problem. We explore the manner in which these tendencies interactwith a mapping problem that demands attention to detail (the irregular forms) aswell as the opportunity to abstract generalizations (over the regular forms). All ofthese manipulations permit the assessment of the generality of our findings acrossdifferent learning conditions (thereby affording a comparison with the circum-stances of learning in children) and to evaluate the determinants of networkdynamics.

    In summary, our primary goal is to determine whether gradual quantitative andstructural changes in the verb vocabulary can lead to qualitative shifts in themanner in which a network organizes the mapping relationship between verbstems and their past tense forms. To this end, we hope to demonstrate that anincremental training regime that encompasses both regular and irregular map-pings can indeed lead to a representational shift from rote learning to systembuilding within the confines of a single network. The results of these simulationswill be of interest to acquisitionists only to the extent that they inform ourunderstanding of the factors that might trigger such a representational shift inyoung children. Therefore, we provide an assessment of the degree to which thefactors that predict the transition to system building in the network are alsopredictive of the construction of a systematic verb morphology in young children.

    ‘Hidden units receive no direct input from the environment but only from the input units. Theyprovide the network with the capacity to construct internal representat ions of the input vectors. Thesimilarity of input vector representations at the hidden unit level may differ substantially from thesimilarity between the input vectors themselves. Hidden units thus introduce a non-linear componentinto the mapping process.

  • 8/20/2019 Plunkett&Marchman 1993


    K. Plunkett, V. Marchman I Cognition 48 (1993) 21-69 29

    We will focus on the role of vocabulary size and structure as predictive factors inyoung children’s acquisition of the English past tense and the type of errorsobserved during the different stages of past tense acquisition.

    e h o d


    All simulations involve training a multi-layered perceptron to map phonologi-cally represented verb stems to their corresponding past tense forms on the

    RLEARN simulator (Center for Research in Language, UCSD) using a backpropa-gation learning algorithm. Backpropagation involves the adjustment of weightedcotmections and unit biases when a discrepancy is detected between the actualoutput of the network and the desired output specified in a teacher signal. Inmultilayered perceptrons (containing hidden units), error is assigned to non-output units in proportion to the weighted sum of the errors computed on theoutput layer.

    The majority of networks used in this set of simulations contain 18 input units,30 hidden units and 20 output units (although see Network resources for adescription of control simulations where the number of hidden units is systemati-cally varied from 15 to 50). All layers in the network are fully interconnected in a

    strictly feed-forward fashion.Training in the simulations follows a pattern update schedule, that is, a pattern

    is presented to the net, a signal propagates through the net, the error iscalculated, and the weights are adjusted. Pattern update is preferred to batchupdate (in which error signals are averaged over a range of input patterns beforethe weights are adjusted) for this problem since children are unlikely to monitoran average error on their output, but are more likely to monitor the errorassociated with individual pattern tokens. Learning rate and momentum2 are heldconstant throughout the simulation at values of 0.1 and 0.0, respectively. Verbstems are presented randomly to the network within each epoch3 of training.Performance after selected epochs of training was evaluated using the outputanalyses procedures described below. All simulations have been replicated with

    ‘The learning rate is a constant term in the learning rule for updating connection weights, The size

    of the learning rate determines the amount a connection changes in response to a given error s ignal.The momentum parameter is an additional constant term in the learning rule which takes into accountchanges to the weights made on previous learning trials.

    3An epoch consists of a sweep through the entire training vocabulary. Note that the trainingvocabulary may contain mult iple repetitions of individual verb tokens.

  • 8/20/2019 Plunkett&Marchman 1993


    30 K . un k e , V . r ch m n g n n 4 8 (1993) 1 69

    differing random seeds.4 However, we have not averaged the results of simula-

    tions as this would be inappropriate to the analysis of U-shaped errors.

    oc b u l r y

    A vocabulary of 500 verb stems is constructed from a dictionary of approxi-

    mately 1,000 stems. Each verb in the dictionary consists of a constant-vowel-

    consonant (CVC) string, a CCV string or a VCC string. Each string is phonologi-cally well formed, even though it may not correspond to an actual English word.

    The dictionary itself is constructed from the 14,400 possible CVC, CCV and VCC

    combinations. However, this number is further reduced by the condition of

    phonological well-formedness. The final dictionary of 1,000 stems is then selected

    randomly from these remaining well-formed stems. Thus, the base dictionary

    itself is built from a random sampling of the initial phonological space (subject to

    well-formedness constraints), contrary to the claims of Prasada and Pinker (1993)

    that “training and generalization items were drawn from the same small region of

    the space of possible forms and hence were similar to one another” (p. 38).

    Each vowel and consonant is represented by a set of phonological contrasts,

    such as voiced/unvoiced, front/center/back.5 Table 1 summarizes the phonologi-

    cal representations for all consonants and vowels used in the simulations.

    Verb stems are assigned to one of four classes. Each class corresponds to a type

    of transformation analogous to classes of past tense formation in English. The

    four classes of transformation are as follows:6

    r b i t r r y m pp i n g s

    There is no apparent relationship between the stem and its past tense form, for

    example, “go + w e n t .

    d e n t i t y m pp i n g

    Past tense forms are identical to their corresponding verb stems. Such

    mappings are contingent upon the verb stem ending in a dental consonant (/t / or

    /d/), for example, h i t + h i t .

    4At the start of a simulation all connections in the network are assigned values randomly, typicallywithin the range +0.5. Repeating a simulation but with a different random seed, that is, a different setof random start weights, permits an evaluation of the degree to which the start state of the networkinfluences training.

    ‘See Plunkett and Marchman (1991) for a more thorough discussion of the phonologicalrepresentation used here.

    ‘A more fine-grained classification of the past tense of English is provided by Bybee and Slobin(1982) and Pinker and Prince (1988). However, the current four-way distinction serves to capturemany of the phenomena of interest.

  • 8/20/2019 Plunkett&Marchman 1993


    K. Plunkett, V. Marchman I Cognition 48 (1993) 21-69 31

    Table 1. h n g c r epr es en n

    P ho

    nolog ic

    al fe

    a t u re

    unit s

    C on . lv ow . Vo icin g M ann e r P la ce

    l 2 3 4 5 6

    lb /

    lP J I d I I t I 0

    lg llk l

    IV1 0I f/ 0/m l In / 0

    lv i 161

    lOI 0lzl 0lsllwl/I/ 0i r l 0

    I l lh l 0

    Ii/ (eat)III bi t ) lo/ b oa t ) I”/ b u t ) lul b oo t ) IUI b oo k ) iel b a i t )

    / I be t ) /a il bi t e )

    lrel b a t ) /au / co w ) IO/ o r )











































































































































































    V e ch n ge

    Certain vowels can be changed under the condition that they precede particular

    consonants. The following four vowel-consonant cluster changes are permitted:

    (1) lizI+ /ez/(2) /it/+ /Et/(3) /ais/ -+ /es/(4) /ail/ + /Ol/

    ‘%z -+

    k k ”

    s es ”

    r = 01 ”

  • 8/20/2019 Plunkett&Marchman 1993


    3 K . un k e , V . r ch m n g n n 4 8 (1993) 1 69

    R eg u r m pp n gs

    A suffix is appended to the verb stem. The form of the suffix follows theallomorphy of English, and hence depends upon the final vowel/consonant in thestem:

    (1) If the stem ends in a dental consonant (/t/ or /d/), then the suffix is /-id/, for

    example, p p d ” .

    ( ) If the stem ends in a voiced consonant or vowel, then the suffix is voiced /d/,

    for example, d m d m d ” .

    (3) If the stem ending is unvoiced, then the suffix is unvoiced /t/, p k p k ” .

    The suffixes on the regular past tense forms are represented non-phonological-

    ly as three distinct patterns across two output units, that is, 0 1, 10, and 1 1. A

    fourth pattern (0 0) corresponds to the absence of a suffix, as is the case for stems

    in the irregular classes (i.e., arbitrary, identity and vowel change).

    Stems are assigned randomly from the dictionary to each of the four classes

    (with no replacement), with the constraint that stems possess the appropriate

    characteristics of a given class. The resulting 500-verb vocabulary contains 2 stems

    in the arbitrary class, 458 stems in the regular class, 20 stems in the identity class

    and 20 stems in the vowel change class. Each of the four vowel-consonant

    clusters defining the vowel change class contains 5 members. Stem assignments to

    the arbitrary and regular classes are not contingent upon any particular criteria,and these classes may contain stems which have phonological characteristics of

    identity mapping or vowel change stems. The total number of stems assigned to

    each verb class is designed to approximate roughly the verb vocabulary of a child

    who has already mastered the past tense of English; in particular, regulars greatly

    outnumber the combined irregular classes, and arbitrary mappings are an order of

    magnitude lower in number than the other irregular mappings.

    Appropriate past tense forms are constructed for each vocabulary item in each

    of the four classes. In the case of stems in the arbitrary class, a past tense form is

    chosen that does not share any consonants or vowels with the stem, nor

    corresponds to the stem or past tense form of any other verb in the training set.

    The past tense forms for members of the other three classes are constructedaccording to the criteria listed above.

    After 500 verbs have been assigned to the four class types, a subset of 20 verbs

    is randomly selected from the vocabulary for use in the initial phase of training. In

    the majority of simulations, the initial training set is comprised of 2 arbitrary

    stems, 10 regular stems, 4 identity stems and 4 vowel change stems. These initial

    vocabulary configurations reflect several aspects of what is known about children’s

    early verb vocabularies from naturalistic and parental report measures. For

  • 8/20/2019 Plunkett&Marchman 1993


    K . un k e , V r ch m n g n n 4 8 (1993) 1 69 33

    example, data from the MacArthur Communicative Development Inventory:

    Toddler Form (CDI) (Fenson et al., in press) indicate that of the 20 most

    frequently reported verbs by parents of children between the ages of 16 and 30

    months of age, 10 are regular and 10 are irregular.The token frequency (i.e., the frequency with which any given stem is likely to

    be repeated within a single training epoch) during this initial phase of learning is

    15 for the arbitrary stems. Regular, identity and vowel change stems have a token

    frequency of 5.7 It has been observed that verbs learned early by children tend tohave a high token frequency in the language (Pinker & Prince, 1988). Further-

    more, the enhanced token frequency of arbitrary forms reflects previous simula-

    tion results (Plunkett & Marchman, 1989) during which such verbs are only

    mastered by backpropagation networks (in the context of a large number of

    conflicting mappings) when exposure to individual stem-pastfrequent.

    r er 1 v er s u s ep ch b s ed v c b u r y ex p n s n s ch edu es

    The network is trained on the initial vocabularies until all verb stems are

    mapped correctly to their appropriate past tense forms. Thus, by definition,

    vocabulary expansion begins at a point in training when performance on the initial

    set of verbs is perfect. Two general types of expansion schedules were tested. On

    the first schedule, cr er 1 e x p n s n , the vocabulary is expanded one verb stem at

    a time and trained until that new verb is successfully mapped by the network. Inother words, the network must learn each new verb to criterion before having

    access to other members of the target vocabulary.

    tense form pairs is

    On the second training schedule, ep ch ex p n s n , a new verb is introduced to

    the vocabulary and trained for a set number of epochs. Another verb is then

    introduced into the training set. Note that the expansion of the vocabulary occurs

    rr es p ec v e f h e ev e f p er f r m n ce n h e m pp n g f h e pr ev u s y

    n r du ce d n e v er b . This process is repeated until the vocabulary reaches 500

    verbs. Early in training, a new verb is introduced every 5 epochs until vocabulary

    size reaches 100. Thereafter, training is reduced to 1 epoch per new verb. This

    increased rate of vocabulary growth is intended to model non-linearities in rate of

    ‘Pilot simulations indicated that the network would fail to learn all of the regular stems in theinitial training set if a token frequency of less than 3 was used. Further increments in the tokenfrequency of regular stems and non-arbitrary irregular stems in the initial training set accelerates theirlearning relative to the arbitrary stems. However, provided the number of high-frequency stems(regular or irregular) is kept small (approximately less than 20) and the number of arbitrary stems doesnot exceed around 4, the initial training set can be learned to criterion. This robustness in learning ofthe initial training set permits variability in the relative frequency of the high-frequency stems,suggesting some flexibility in the input conditions that support the acquisition of early inflectional verbmorphology.

  • 8/20/2019 Plunkett&Marchman 1993


    34 K . un k e , V r ch m n g n n 4 8 (1993) 1 69

    vocabulary growth that are sometimes observed in longitudinal studies of young

    children (Dromi, 1987). That is, vocabulary expansion proceeds at a relatively

    slow pace early on, while later growth is more rapid.The order in which new verbs enter the vocabulary is determined by a

    weighted random selection process based on an 80% likelihood that the new verb

    is taken from the regular class and a 20% likelihood that the verb is taken from

    the identity or vowel change classes. (Recall that the two arbitrary verbs are

    members of the initial vocabulary.) Each new verb entered into the training set

    after the initial set of 20 is assigned a token frequency of 3, until the vocabularysize reaches a total of 100 verbs. Thereafter, verbs that are introduced (predomin-

    antly regulars) are trained using a token frequency of 1. This frequency profile

    was again chosen to accommodate the data set to the observation that children

    are more likely to hear, and thus have a greater opportunity to learn, verbs with a

    high token frequency.A summary of the changing structure of the vocabulary by verb class is

    Table 2. V c b u r y s ru c ur e b y v er b c ss

    o t l r s g s s v c s ok ”

    F ig u r 1 r p r n

    08 0






    3 8 0


    10 4 461 8 9

    77 10 11

    11 11 1

    163 1 0

    1 17 0

    7 0 0

    33 8 0 0

    4 8 0 0

    “ r i t r r y v r s h v t ok n r qu n c y o 1

    0 100 00 300 400 00Vocabulary Size

    f r eg u r nd rr eg u r v e r b s n r n n g s e s v c b u r y s z e e x p nd s .

  • 8/20/2019 Plunkett&Marchman 1993


    K . un k e , V . r ch m n g n n 4 8 (1993) 1 69 3

    provided in Table 2, and Fig. 1 plots the relative proportion of regular and

    irregular verb tokens as vocabulary size is expanded. Note that Fig. 1 indicates a

    switch in early training from a predominance of irregular verbs (approximately60%) to a predominance of regular verbs. This switch reflects a changing

    proportion of regular and irregular verbs that is observed in children’s early

    vocabularies as reported by Marchman and Bates (in press) and contradicts the

    claim by Pinker and Prince (1988) that the proportion of regular and irregular

    verbs in children’s early vocabularies are approximately equal throughout de-

    v e lo m e n .

    n v c b u r y s ru c ur e

    While child language data for English suggest that early verb vocabularies are

    likely to undergo a shift from a predominance of irregular to regular verbs, it is

    also known that considerable individual variation exists in early vocabulary

    composition (e.g., Bates, Bretherton, & Snyder, 1988). Hence, we conducted

    several series of simulations that vary in terms of their initial proportion of regular

    to irregular verbs. Nine early vocabulary configurations are tested using an

    epoch-based expansion schedule. Each configuration differs in terms of the

    number of regular forms to which the network is exposed during the earliest

    phase of training. For example, the number of regular and irregular forms

    comprising the initial training set of 20 verbs in each of the nine conditions varied

    from 0 regulars-20 irregulars to 18 regulars-2 irregulars. In four of the condi-tions, the proportion of regulars to irregulars remains approximately constant

    until the 40-verb mark (O%, 25%, 50%, 75%). In the remaining five conditions,

    the proportion of regulars either decreased or increased as vocabulary expanded

    from 20 to 40 verbs. Thereafter, for all conditions, vocabulary expansion

    proceeded according to the weighted random selection procedure described

    above, and token frequency parameters were assigned in the standard fashion.

    (The exact numbers of items in each of the vocabularies are presented below

    accompanying the discussion of the results.)

    These control simulations permit an evaluation of (1) the degree to which the

    presence of r eg u r verb forms early in training is a necessary condition for the

    subsequent onset of generalizations, and conversely, (2) whether an overabund-

    ance of irregular forms early in training can b ck the development of system

    building in the network and hence the generalization to new regular forms.

    N e rk r es ur ce s

    Given the relationship between generalization abilities and number of hiddenunits in these networks, the role of representational resources was explored by

  • 8/20/2019 Plunkett&Marchman 1993


    36 K. Plunkett, V. Marchman I Cognition 48 1993) 21-69

    manipulating the number of hidden units used to configure the architecture of thenetwork. Two measures of performance were of interest here: (1) the ability to

    learn verbs in the training set, and (2) the networks’ ability to generalize to novelverb forms. A total of nine conditions were evaluated, ranging from using 10

    hidden units to 50 hidden units, in increments of 5 units. For each condition, the

    epoch-based expansion schedule was used; that is, after training on an initial set

    of 20 verbs (10 regular and 10 irregular), a new verb was added to the training set

    every 5 epochs. The same weighted random selection procedures and token

    frequency parameters as described above were used. Hence, the only factor

    varied across condition was the number of hidden units.

    o v el v er b s

    A set of 100 legal stems which were not included in the training set were

    selected from the dictionary for testing the generalization properties of thenetwork. Of these, 10 end in a dental-final consonant (/t/ or /d/): identity

    m pp n g 10 stems possess the characteristics of each of the 4 clusters defining the

    vowel change class (a total of 40 stems): v e ch n ge and 50 stems that did not

    possess any of the previously mentioned characteristics: nd e er m n es . It is worth

    emphasizing that the indeterminate novel stems do not form a well-partitioned

    group in phonological space. They have no more phonological features in

    common with each other than they do with the identity mapping or vowel change

    novel stems. These subclasses of novel stems permit an evaluation of the mannerin which the network has tuned its response characteristics to the presence (or

    absence) of specific phonological features in the stems making up the training set.

    The network’s performance on the novel verbs was evaluated at regular intervals

    during training using the output analysis procedures described below.

    u pu n ys s

    The weight matrices were saved at regular intervals; first, when the net had just

    mastered the initial 20 verbs and then each time a new verb was introduced but

    before any training on the new verb had occurred. These weight matrices provide

    snapshots at various points in training that permit evaluations of the accuracy of

    the network in producing the correct past tense form for each unique stem at

    different points in development. For every given stem, the output of the networkwas evaluated in terms of the “closest fit” (in Euclidean space) to the set of

    phonemes that map the output space, defined by the teacher signal to the network

    (see Table 1). Error analysis provided an overall hit rate (i.e., percentage

    correct), as well as the proportion of stems in each class that were r eg u r z ed

  • 8/20/2019 Plunkett&Marchman 1993


    K . un k e , T r ch m n g n n 4 8 (1993) 3 69 37

    (i.e., add a suffix), as well as rr eg uh r z ed (i.e., incorrectly mapped as identitystems, vowel change stems, blends, etc.). The following error coding categoriesare used:

    SUF: The stem is regularized. For regular stems, this indicates that aninappropriate but otherwise legal suffix is affixed.

    ID: The stem and past tense have the same The stem undergoes a vowel change. For vowel change stems, this

    indicates that an inappropriate vowel change occurs.BLD: The stem is blended; that is, it undergoes both vowel suppletion and

    suffixation.UNC: Unclassifiable responses (typically incorrect mapping of consonants).

    Novel verb stems were also tested on each of the saved weight matrices at eachtesting point. Using a similar output analysis procedure, the three differentcategories of novel verbs (indeterminate, identity, vowel change) were analyzedseparately to determine their output tendencies; that is, whether they tend to beregularized, irregularized or handled in some other fashion by the net as afunction of amount of training and vocabulary size.

    e s l s

    r er 1 e x p n s n

    We begin our results presentation by outlining the overall ability of thenetworks to learn when vocabulary expansion is contingent upon previousperformance (criteria1 learning). Recall that in this condition, vocabulary size wasincreased one verb at a time and training continued on each new verb (as well asthe initial set) until that new verb was successfully mapped by the network. Theresults indicated that training on the initial set of 20 verbs required approximately15-40 epochs to reach criterion, depending on the initial configuration of randomweights. Further, successful training on subsequently added stem-past tensemappings consistently failed when vocabulary size reached approximately 27verbs. In other words, criteria1 expansion appeared to fall considerably short in itsability to allow the network to structure its resources in such a way so as to masterthe entire set of 500 stems in the target vocabulary. In order to verify thatsubsequent learning was impossible, training was continued for a considerablenumber of epochs in each case. Analyses of the mean squared error on the outputunits clearly indicated that the error reaches asymptote at a non-zero level in allnetworks at around this vocabulary size.

    The inability of networks trained using the criteria1 expansion procedure to

  • 8/20/2019 Plunkett&Marchman 1993


    38 K . un k e , V . r ch m n g n n 48 (1993) 1 69

    learn a large number of verb stem/past tense mappings reflects the propensity ofnetworks of this type to be caught in “local minima”. Learning in networks of this

    type can be understood as the process of traversing a hilly multidimensional

    landscape where the regions of any part of the landscape are defined by the values

    of the weight matrix, and the height of the landscape is just the error that results

    from a given configuration of the weight matrix. For example, in a network with

    just two weights we can define an error surface in three dimensions where the

    error is plotted on the vertical axis, and the coordinates in the horizontal planeare just the corresponding values of the two weights. A local minimum is a point

    on the error surface where the gradient is zero but which does not correspond to

    the global minimum of the error function. Since the learning algorithm used in the

    network is sensitive to the slope of the error surface, certain configurations of the

    weight matrix can result in the network becoming entrenched in a given statewhere no further learning can occur. Alternatively, the weights feeding into a

    given unit may become very large (either positive or negative) as a result of

    repeated presentations of the same input. Other inputs subsequently presented to

    the network which use the same weight lines must fight an uphill battle to

    overcome the mapping characteristics of the previously presented input. Because

    early inputs to the network continue to be presented with a relatively high

    frequency, new inputs will have difficulty overcoming the early bias of the

    network. It is clear, therefore, that the mapping characteristics of the initial

    training set can have important implications for subsequent learning when a

    criteria1 incremental learning schedule is used in network architectures of this

    type. In order for networks to avoid entrenchment in inappropriate areas ofweight space, training must ensure that a variety of weight changes occur. If the

    network is repeatedly trained on a limited and fixed number of patterns, where a

    series of similar weight changes occur, further training may fail to promotenecessary reorganizations or may even enhance the network’s entrenchment in a

    particular region in weight space. This training schedule was therefore abandoned

    as a method of vocabulary expansion that is appropriate to the current task.

    The following sections report on results from simulations in which verbs are

    added to the vocabulary rr es p ec v e of the level of performance of the network on

    the previously added verb.

    E p ch ex p n s n

    v er p er f r m n ce

    Figures 2(a) and 2(b) summarize performance (percentage correct) on verbs in

    the irregular (arbitrary, identity and vowel change combined) and regular classes,

    respectively, as a function of vocabulary size.

  • 8/20/2019 Plunkett&Marchman 1993


    K . un k e , V . r ch m n gn n 4 8 (1993) 1 69 39

    Irregulars Regulars


    60 ,, ‘,,,,,,,,, ‘,,‘,,,,, ‘,,, 60 ,,‘,,‘,,,,,,,,,“,,,,,,,,0 100 200 300 400 500 0 100 200 300 400 500

    Vocabulary Size Vocabulary Size

    (a> (b)Figure 2. r es f r rr eg u r nd r eg u r v er b s .

    Recall that before vocabulary expansion is allowed to begin, the network is

    trained to 100% accuracy on the initial set of 20 verbs. There are several things to

    note in Figs. Z(a) and 2(b). First, overall performance (as measured by hit rate for

    all current trained forms) deteriorates substantially at several points in training.

    Nevertheless, the network eventually recovers from these setbacks. For example,

    by the time vocabulary reaches 500 verbs, irregular verbs have achieved 100%correct output, and between 95% and 100% of the regular verbs were produced

    correctly.’ In general, then, the epoch-based incremental training schedules were

    considerably more successful at allowing the networks to master the entire

    vocabulary than criterial-based learning schedules. It should be stressed that the

    decrements in performance plotted in Fig. 2 are primarily the result of network

    inaccuracies in mapping new v e r b s that are entered into the training set. Thedecrements do not necessarily indicate the unlearning of verbs which had already

    been mastered by the system. Many of the verbs in the training set may continue

    to be mapped appropriately (cf. the criteria1 expansion schedule above), while

    others may indeed be “unlearned”, demonstrating a sort of U-shaped develop-

    ment. Analysis of the patterns of U-shaped learning in these simulations are

    discussed below.

    A closer look at Fig. 2 reveals two major periods of decrement. For bothregular and irregular verbs, overall performance drops fairly early in learning,

    almost immediately after vocabulary size has begun to increase. Yet, recovery

    ‘Absolute final level of performance on regular verbs varied within this range as a result ofvariations in the initial weight matrix for different simulations.

  • 8/20/2019 Plunkett&Marchman 1993


    40 K . un k e , V r ch m n g n n 4 8 (1993) 1 69

    comes quickly, first manifesting itself for the regular verbs when total vocabulary

    size reaches approximately 44 verbs, and for the irregulars at approximately 31

    verbs. Given that verbs are introduced into the vocabulary at a constant rate both

    before and after these periods, these data suggest that beyond a vocabulary size of

    around 50 items, new verbs appear to be learned faster than verbs introduced

    during the early stages of expansion. Thus, there is preliminary evidence to

    suggest that the number of verbs in the current vocabulary may indeed be an

    important factor in determining the network’s ability to learn new lexical items

    and, presumably, to generalize to novel forms. We will continue to evaluate this

    hypothesis in subsequent sections.

    Figure 2(a) also indicates that irregular verbs undergo a substantial decrement

    in overall performance during the middle period of training, that is, when

    vocabulary size reaches approximately 125 verbs. During this period (125-210verbs), the number of irregular verbs in the vocabularies has increased from

    approximately 23 items to approximately 37 items (including an extra 5 IDS and 9

    VCs). An analogous, but much less drastic, decrement in overall performance is

    observed for the regular verbs (Fig. 2b) around the same period (when vocabulary

    size ranges from approximately 90 to 260 verbs). In interpreting these data, it

    should be recalled (again) that these decrements in performance do not necessari-

    ly indicate the U-shaped “unlearning” of individual past tense mappings. These

    patterns of performance are also a reflection of the inability of the system to learn

    new items that are entered into the vocabulary. Indeed, this interpretation relates

    directly to the changes in the input frequency characteristics of verbs introduced

    after the 100 vocabulary mark. Recall that all verbs (both regular and irregular)that are introduced after the 100 vocabulary mark are trained with a token

    frequency of 1. Previous work with networks of this type (Plunkett & Marchman,

    1991) has shown that mappings with a low token frequency (in particular,

    irregular arbitrary and vowel change verbs) are difficult for a network to master in

    the context of a large number of conflicting mappings.

    In general, then, mastery of this set of mappings does not follow a straight-forward learning function in the context of gradual and incremental increases in

    vocabulary size. In subsequent sections, analyses seek to isolate the degree to

    which decrements in overall correct performance reflect the inability of the system

    to learn new items entering the vocabulary, in contrast to the “unlearning” of

    forms that were previously successfully mapped. More specifically, we attempt to

    target whether changes in overall performance reflect quaEitutive changes innetwork organization deriving from these incremental changes in vocabulary size.

    E rr r s s f un c n f v c b u r y s z e

    Network output for every stem in the current vocabulary is determined using a

    closest-fit algorithm, and all incorrectly generated mappings are categorized by

  • 8/20/2019 Plunkett&Marchman 1993


    K. Plunkett, V. Marchman I Cognition 48 (1993) 21-69 41

    verb class. A verb stem is incorrectly mapped when the closest fit of the output(in Euclidean space) for each of the phonemes does not match that specified in

    the teacher signal. For these analyses, errors include both inappropriate output on

    verbs that had yet to be correctly mapped by the network, as well as incorrect

    mappings for verbs that had previously been successfully mastered by the

    network. Table 3 summarizes the frequency and timing of errors as a function of

    expanding verb vocabulary size (represented by successive rows in the table), as

    well as hit rates (percentage correct) for each verb class. Error scores indicatepercentage of total errors for items in a given class. An analysis of the arbitrary

    class is not presented since these items perform at optimal level throughout the

    expansion schedules. We review evidence below (from Marcus et al., 1992) that

    indicates, contrary to popular belief, that errors of the kind “go-ed” and “went-

    ed” (overregularizations to the arbitrary class) are very infrequent in children’sspontaneous speech. Given the constraints on learning inherent in these net-

    works, arbitrary stems would become more susceptible to error if their token

    frequency had been lower, for example, 5 instead of 15 (Plunkett & Marchman,

    1991). The robustness of the arbitrary class in the current set of simulations

    reflects the functional modularity that can be achieved within the confines of a

    single mechanism that must learn to perform multiple types of mapping via the

    manipulation of frequency characteristics of the input.

    Several comments should be made concerning the data presented in Table 3.

    First, note that the overall level of errors is low, as the ver ge hit rate across the

    training period is 95.6% for regulars and 97.6% for irregulars. Second, when

    errors did occur, they were likely to be circumscribed to a limited range of errortypes for all classes. The error categories of SUF, ID, and VC account for the

    overwhelming majority of incorrect responses. Residual errors derive primarily

    from the incorrect mapping of consonants on regular verbs. These unclassifiable

    responses (UNC) were more common early in training, although a few did occur

    as late as a vocabulary size of 490 verbs. Recall that the network is forced to make

    a response on every trial, with the output determined by the closest fit in

    Euclidean space to legal phonemes. It is also possible to restrict the output of the

    network, and hence eliminate many of the unclassifiable responses, if the closest

    fit metric is supplemented by a proximity criterion (i.e., closest phoneme and

    within a specified distance). In this case, we would be evaluating only those

    responses for which there was some degree of certainly, analogous to a child using

    a particular past tense verb form only when he or she is relatively sure of how to

    pronounce all components of that form. Interestingly, however, current evidencefrom the child language literature suggests that eliminating unclassifiable forms in

    this fashion may not be entirely valid. For example, Plunkett (1993) reports the

    use of non-standard forms by children which involve the substitution of inapprop-

    riate vowels and consonants in target lexical forms. Admittedly, these non-

    standard usages are most prevalent in that period of language development prior

    to the vocabulary spurt. However, incorrect usage of phonemes in words did not

  • 8/20/2019 Plunkett&Marchman 1993


    42 K. Plunkett, V. Marchman I Cognition 48 (1993) 21-69

  • 8/20/2019 Plunkett&Marchman 1993


    K . un k e , V . r ch m n g n n 4 8 (1993) 1 69 43

  • 8/20/2019 Plunkett&Marchman 1993


    44 K. Plunkett, V. Marchman I Cognition 48 1993) 21-69

    disappear entirely from children’s later productions. (See Cottrell & Plunkett,1991, for further discussion.)

    It should also be noted that although identity mapping errors can be unambigu-ously identified in the network’s performance, this is not the case for young

    children. In the network, every unchanged output is by definition an identity map,

    and it is an identity mapping error when the form in fact required a change. In

    contrast, in children, no change to the stem may result from a process of identity

    mapping 01 a failure to include tense marking in an obligatory past tense context.

    It is difficult to ascertain, in English-speaking adults or children, which of these

    two accounts applies to any individual no change error. We will follow other

    researchers (Brown, 1973; Bybee & Slobin, 1982; Marchman, 1988) in assuming

    that when children mark for past tense in over 90% of obligatory contexts, then

    responses which are identical in form to the stem can indeed be considered to be

    identity mapping errors. Our model takes no account of the role of children’s

    conceptual development regarding tense or pastness in explaining their past tense


    Each class of verbs was susceptible to different types of errors. Table 3

    indicates that items belonging to the identity mapping class were likely to be

    produced correctly throughout the training period (M = 98.4% correct), yet some

    errors did occur (maximum error rate = 16% of responses). Of the errors that

    were produced on stems in this class, the responses were primarily erroneous

    suffixations (A4 = 14%) and to a much lesser extent, vowel changes (M = 3%).

    Unclassifiable errors were observed on only one occasion. In addition, it is

    interesting to note that all of the identity stems that underwent an erroneousvowel change possessed the requisite vowel/consonant stem final combination,

    and identity stems n ev e r underwent blending (i.e., a simultaneous vowel change

    and suffixation).

    Vowel change stems were also mapped successfully across a large part of the

    training period (A4 = 96.7% correct), yet a range of errors occurred during a fairly

    circumscribed portion of learning (vocabulary sizes between 130 and 210 items). If

    an error was produced, it was most likely to be an identity mapping (M = 5%),inappropriate vowel change (A4 = 3.7%), or blend (A4 = 6.8%). Suffixation errors

    were observed on a few occasions, although these comprised less than 1% of the

    responses to vowel change stems. Unclassifiable responses were rare, but not

    completely absent.

    While regular verbs were incorrectly mapped only an average of 4.4% of the

    time (range = O-20%), when an error did occur, it was likely to result from

    identity mapping (A4 = 18%), an inappropriate suffix (M = 18%), or a blend(A4 = 10%). Note that regular stems were susceptible to identity mapping and

    inappropriate suffix errors throughout the training period; however, blends were

    more likely to occur during the last part of training. Further analyses indicated

    that while most of the regular stems that were identity mapped ended in a dental

  • 8/20/2019 Plunkett&Marchman 1993


    K. Plunkett, V. Marchman I Cognition 48 (1993) 21-69 45

    consonant (M = 85%), a small proportion of non-dental final regular stems were

    also identity mapped (M = 15%).

    In summary, a gradual, epoch-based schedule of increases in vocabulary sizeallowed the network to master the entire set of mappings, yet distinct periods of

    erroneous performance were still observed. Interestingly, certain error types werelikely to occur with verbs that possessed particular characteristics, suggesting thaterrors were partially conditioned by the phonological shape of the stem. This wastrue for stems which underwent an inappropriate identity mapping (i.e., ended in

    a dental consonant), as well as those undergoing vowel suppletion. Further,

    unclassifiable responses were more likely to be observed early in training, while

    error types that resulted from the merging of the response patterns (i.e., blends)were more likely to occur later. In the next section, we focus on analyses of the

    relationship between these error types and previous performance - that is, thenature and timing of “unlearning” in these networks.

    R eg u r z n s , rr eg u r z n s , nd U s h p ed d ev e p m en

    We continue our analysis of network performance by outlining the nature and

    timing of errors produced by the network in relation to what is known about errorproduction in children. In discussions of past tense acquisition in children, it is

    customary to focus on a particular type of error - r eg u r z n s that is, errors inwhich an irregular stem is treated as though it were regular in its past tense form.

    Following Marcus et al. (1992), these include errors on irregular verbs which have

    previously been correctly produced by the child, as well as those that have not yet

    been produced correctly in their past tense form. Using the same criteria asMarcus et al. (1992, p. 29), we derive the rate of regularization using the

    following formula:9

    overregularization tokens

    overregularization tokens + correct irregular past tokens

    Figure 3(b) plots the frequency of overregularization of irregular verbs in the

    current set of simulations as a function of vocabulary size. For purposes of

    comparison, Fig. 3(a) presents data on the frequency of overregularization errors

    for one child, Adam, as reported in Marcus et al. (1992) (Fig. 3, p. 38).

    The important characteristics of the pattern of overregularization errorsevident in both Adam (Fig. 3a) and network (Fig. 3b) include:

    (1) A generally high level of performance on the

    verbs across the period.

    past tense forms of irregular

    ‘Note that in the simulation data this measure reflects the number of irregular verb types (nottokens) that are overregularized. Because the network is asked to produce the past tense form of eachverb in the training set exactly once per testing, we have no way of assessing the frequency with whicha given individual stem will be produced erroneously.

  • 8/20/2019 Plunkett&Marchman 1993


    Adam Simulation

    100 1002 90 90

    80 : 80

    ; 70 5 70

    d 60 n_ 60

    50 ? 50

    E 40 ; 40

    g 30 g 30

    0” 20 2 20

    S 10 * 100 ,““,““,““,““,““,““, o,,,,3,8007,m.,‘,

    25 30 35 40 45 50 55 20 120 220 320

    Age in Months Vocabulary Size

    (a> (b)ig u r e 3. (1 - overregularization rate) for Adam (reproduced from Marcus et al., 1992) and the

    simulation. Data are expressed as percentages of irregular verbs produced. The overregulari-zation rate for Adam reflects the number of verb tokens whereas for the simulation it reflectsthe number of verb types (see also footnote 9).

    (2) An initial period of error-free performance on irregular past tense forms.

    (3) A prolonged period where a small minority of irregular verbs are overregular-

    ized (i.e., suffixed), including both irregular verbs that have been previouslymapped correctly by the network (i.e., that have undergone U-shaped

    unlearning), as well as irregular verbs that have been introduced recently to

    the training set.

    This general pattern is characteristic of data reported by Marcus et al. (1992)

    for other children in the Brown corpus (i.e., Eve and Sarah, their figures 4 and 5,

    pp. 38-39). However, this profile is different from that reported for Abe (seetheir figure 6, p. 39), who does not show an initial period of correct performance

    on irregular past tense forms. Hence, while the classic pattern of overregulariza-

    tions following initial correct performance may be exemplified by many children

    for whom longitudinal data are available, it may not be characteristic of the

    learning pattern of alE children and may not reflect the operation of a separable,

    rote-learning device.

    In order to investigate whether the general pattern reflects the U-shapedunlearning of individual stems, we isolated those stems that were correctly

    produced by the network, then incorrectly mapped at some subsequent point in

    training, and then, finally, correctly output once again. A total of 15% of the

    identity mapping and 30% of the vowel change verbs were classified as undergo-

  • 8/20/2019 Plunkett&Marchman 1993


    K . un k e , V r ch m n g n n 4 8 (1993) 1 69 47

    ing U-shaped acquisition according to these criteria. None of the arbitrary stems

    underwent U-shaped learning, as both forms were produced correctly across the

    period, but such errors are apparently very infrequent in the spontaneousproductions of children (Marcus et al., 1992). For the dental final stems, U-

    shaped errors were most likely to result in the overregularization of the stem

    (66%); however, one of the dental final stems was i rr eg u l u r i z e d , in particular,

    treated as if it were a member of the vowel change class. For vowel change stems,

    many of the U-shaped errors resulted from a blending of a vowel change and a

    suffix (50%, e.g., 1Zs + s t . Only one pure regularization error occurred on a

    vowel change stem throughout training. Another source of U-shaped errors for

    this class of stems was identity mapping (50%). This pattern of U-shaped errors

    for vowel change verbs does seem to diverge from the pattern of pure regulariza-

    tion errors on vowel change verbs in children; that is, examples of overgeneraliza-

    tions often involve vowel change stems, for example, co r n ed , s ee d , b l ow ed ,

    b r k k e d , w i nn ed . While the source of this discrepancy is as yet unknown, it is

    important to point out that information regarding the relative frequencies of error

    types as a function of a verb class in children is very sparse.

    U-shaped errors were also analyzed with respect to when they occurred, that

    is, at what point in training after the verb was correctly mapped was the f i r s t

    incorrect output observed. This analysis indicated that irregular verbs entering the

    lexicon both early and late in training were susceptible to U-shaped acquisition.

    The majority of U-shape onsets for irregular verbs (6 of 9, or 66%) occurred

    during the first half of training (i.e., vocabularies less than 250 verbs). However,

    only one irregular verb in the initial set of 20 was U-shaped. Given that all of theirregular verbs have been entered into the vocabulary by the 250-verb mark, these

    data indicate that verbs entering the vocabulary during the middle portion of

    training were most susceptible, and were overregularized fairly soon after they

    were first mastered. Indeed, the last U-shape on an irregular verb occurred at a

    vocabulary size of 375 verbs. The error data (see Table 3) also indicated that

    irregular stems were likely to be mapped correctly during the second half of


    Regular stems were also susceptible to U-shaped learning, that is, produced

    correctly and then “unlearned” at a later point in training. However, as in

    children, these were likely to occur with only a small subset of the total number of

    regular verbs (17%). These errors were most likely to result from identity

    mapping (M = 35%), blending (M = 24%) or the addition of an inappropriate

    suffix (M = 22.5%). All inappropriate suffixes were, of course, legal suffixes(closest suffix in Euclidean space). To our knowledge, the extent to which

    children in this age range add the wrong allomorph of /ed/ to their past tense

    verb productions is not known. Yet, it has been reported that adults make similar

    non-standard voicing assimilation pronunciations in on-line tests of past tense

    production, for example, l s p e l - t l , l s p il - t l (Marchman & Plunkett, 1991), and

  • 8/20/2019 Plunkett&Marchman 1993


    48 K . un k e , V r ch m n g n n 4 8 (1993) 1 69

    children often produce inappropriate suffixes (of the It/ and Id/ types) in spelling

    tasks (Peter Bryant, personal communication). U-shaped errors on regular verbs

    were observed throughout the training period. However, like the irregular verbs,

    approximately two-thirds occurred when vocabulary size was less than 250 verbs.

    As the vocabulary increased beyond this point, U-shaped errors were more likely

    to result from the incorrect mapping of consonants (18%).In summary, the erroneous performance in these networks is the result of

    errors on new forms entering the vocabulary, as well as the “unlearning” of

    Indeterminate Dental Final Vowel Change










    0 ~

    0 100 200 300 400 500











    0 ~

    0 100 200 300 406 500















    50 3

    0 100 200 300 400 500







    0 100 200 300 400 500







    100 200 300 400 500







    50 ~

    100 200 300 400 500












    0 100 200 300 400 500

    Vocabulary Size



    I 90JJ 60

    0 70 I

    SWg 50





    k 20

    a 10

    0 i,.

    0 100 200 300 400 500

    Vocabulary Size


    100 200 300 400 500

    Vocabulary Size


    Figure 4. N e rk p e r f r m n ce n n v e v e r b s . T hr ee y p es f n v e s em s r e u s ed nd e er m n es em s h ch p ss ess n d e f n n g c h r c e r s cs d en y s em s h ch p ss ess d en f n

    c n s n n v e ch n ge s em s h ch p ss ess v e c n s n n c u s er ch r c e r s c f

    v e ch n ge v e r b s n h e r n n g s e . F r e ch n v e s em y p e, h e p er ce n ge h ch r e

    r eg u r z ed , d en y m pp ed r v e ch n ge d s p ed .

  • 8/20/2019 Plunkett&Marchman 1993


    K . un k e , V . r ch m n gn n 4 8 (1993) 1 69 49

    previously learned verbs. The behavior of the network resembles what is known

    about children in several respects. In both networks and children, erroneous

    output is relatively infrequent (compared to correct performance), although bothregularization and irregularization errors are observed across a large portion of

    the training period. The U-shaped overregularization of irregular forms is most

    likely to occur during the early and middle portions of training, while blends

    (e.g., ated) and other irregularization errors are likely to persist later in


    Responses to novel stems

    The preceding analyses evaluated changes in the network’s ability to produce

    the appropriate past tense forms of verbs that were members of the training set.

    Here, we investigate network performance when it is required to produce the past

    tense forms of stems that it has never seen, that is, novel verbs. As with children

    (e.g., Berko, 1958), the ability to generate reasonable past tense forms of novel

    stems provides a measure of the extent to which the network has abstracted useful

    information from the training set. Further, the way in which these tendencies

    change over the course of acquisition can be seen to reflect concomitant

    representational changes, that is, reorganizations, within the network.

    In Fig. 4, we graph the output of the network when presented with three

    classes of novel forms. Figures 4(a), (d), and (g) plot the tendency of the network

    to treat indeterminate novel stems as if they were members of the regular,

    identity, or vowel changes classes, respectively. Figures 4(b), (e), and (h) plot thesame tendencies for novel stems which possess a dental consonant in stem final

    position (identities). Finally, Fig. 4(c), (f), and (i) present responses for novel

    stems possessing the VC clusters characteristic of vowel change verbs (vowel

    changes). The extent to which the network produces systematic responses to verbs

    with each of these characteristics indicates its sensitivity to the presence (or

    absence) of phonological characteristics which are predictive of class membershipin the training set.

    Note the high level of regularization in response to indeterminate novel stems

    (92%) by the end of training as presented in Fig. 4(a). While suffixation is rare at

    the onset of training, it rapidly increases, yielding an average suffixation rate of

    71% across the period. In contrast, the average rate of producing identity

    mapping forms in response to indeterminate stems is very low (3%) - see Fig.

    4(d). In a similar fashion, virtually none of the indeterminate stems undergo

    vowel change mappings except for a temporary blip early in training-see Fig.

    4(g).Identity novel stems were subjected to the entire range of generalization

    tendencies, resulting in an average of 18% suffixation, 27% identity mapping, and

    26% vowel change responses. Interestingly, however, these responses were not

  • 8/20/2019 Plunkett&Marchman 1993


    0 K . un k e , V . r ch m n g n n 4 8 (1993) 1 69

    equally probable across the course of training. For example, the tendency to adda suffix to an identity stem was quite strong early in training (i.e., when

    vocabularies are smaller than, say, 200 verbs) but then decreased, whereas thefrequency of identity mapping and vowel change responses started off at modest

    levels and tended to increase over the same period.

    Like identity stems, vowel change novel stems were subjected to the entire

    range of response types, although the most predominant pattern was to treat

    these novel stems in accordance with their phonological shape, that is, as if they

    were members of the vowel change class (A4 = 38%, see Fig. 4i). Early intraining, this tendency increased slightly when vocabularies reached between 100

    and 200 verbs, but remained fairly steady across the rest of training. VC stems

    were also identity mapped to a certain extent (11%). Unlike the identities, there

    was not an early overwhelming preference for vowel change stems to be suffixed.

    Instead, this tendency remained fairly constant across the period at approximately28%.

    These patterns suggest that the network is highly sensitive to the phonologicalproperties of stems when generating the past tense forms of novel verbs. Novel

    stems which possess phonological properties characteristic of the sub-regularities

    of the irregular training stems are more likely to undergo a past tense mapping

    associated with the irregular class (i.e., irregularization) than any other type of

    mapping. Thus, identity novel stems are more likely to undergo identity mapping

    than suffixation or vowel change; Vowel change novel stems are more likely to

    undergo vowel change than identity mapping or suffixation. Yet, note that these

    mapping tendencies are not absolute. Identity and vowel change novel stems alsoundergo past tense mappings characteristic of the o t h e r irregular classes, as well

    as the predominant regular class. This is partly because the characteristics defining

    class membership are not absolute. In the training set, the regular class possessed

    members that have characteristics associated with the irregular classes, that is,

    dental final stems or VC clusters. Similarly, irregular verbs in the training set

    overlapped in their phonological properties. For example, one of the phonological

    features of aVC cluster was /it/ + Et/ as in k i t -+ k t . In this case, aVC cluster

    shares a property associated with the identity mapping class, that is, a dental final


    Nevertheless, the propensity to r eg u l r i z e indeterminate novel stems regardless

    of their phonological shape suggests that the trained network has effectively

    defined a d ef u l t m pp i n g s t r t eg y . Recall that the indeterminate stems used to

    test the network do n o t form a well-partitioned group in phonological space. They

    are no more similar to each other than they are to the other classes of novelstems. The net result is that novel stems which do n o t possess the phonological

    features characteristic of other mapping types will be treated in a similar fashion.

    Within the context of this artificial language that is structured to look very much

    like English, these stems are suffixed. It should be stressed, however, that the

  • 8/20/2019 Plunkett&Marchman 1993


    K . un k e , V r ch m n gn n 4 8 (1993) 1 69

    tendency to regularize indeterminate stems is not absolute and categorical. For

    example, an average of 3% of the indeterminate novel stems undergo identity

    mapping, even though these stems do not possess the requisite phonological

    features. This response tendency is another consequence of the fact that

    phonological shape is characteristic, but does not necessarily define each of the

    original irregular classes. Features of the various classes overlap, yielding some

    regular stems in the training set that end in a dental consonant and a few that

    possess the CV cluster typical of vowel change verbs. Nevertheless, the regulariza-

    tion process is very strong, especially during the early and middle periods of

    development-so strong, in fact, that more than one-fifth of the identity and

    vowel change novel stems undergo suffixation.

    In summary, these data suggest that the network has learned to regularize

    novel stems in much the same fashion as children regularize nouns and verbs in a“wug test” (Berko, 1958). In the absence of phonological cues, the network

    overwhelmingly chose to apply the regular pattern, and did so to a certain extent

    even when phonological cues suggested an alternative response. However, inmany cases, phonological features of the novel stem can block application of the

    suffixation process, resulting in the rr eg u ur z n of novel stems (e.g.,

    flow +flew). Similar findings have also been reported for children (Bybee &

    Slobin, 1982; Marchman, 1988; Marchman & Plunkett, 1991).

    G en er z n s f un c n f v c b u r y s z e

    The tendency to regularize indeterminate novel stems alters dramatically

    between early and late training. Early in training, that is, immediately after the

    network has mastered the initial vocabulary of 20 verbs, indeterminate novel

    stems were treated in an unsystematic fashion. Network output was unclassifiable

    in terms of the mapping categories used in Table 3. During this early period of

    training, only one of the novel indeterminate stems was treated in a systematicfashion. It was regularized. The tendency to systematically treat identity and

    vowel change novel stems was greater, including 3 regularizations, 3 identity

    mappings and 11 vowel changes. These data suggest that although the network

    has not yet extracted a mapping from the limited training set that it will apply in a

    default fashion, it has already begun to develop a sensitivity to the phonological

    characteristics of the irregular verbs.It is difficult to evaluate these network findings against empirical data from

    children at an equivalent level of language development. Any such evaluation

    would involve testing early 2-year-olds on novel verb forms which are systemati-

    cally manipulated with respect to phonological form. These experiments have not

    been performed. However, where such experiments have been carried out on

    slightly older children (Bybee & Slobin, 1982; Marchman, 1988) there is clear

  • 8/20/2019 Plunkett&Marchman 1993


    K . un k e , r ch m n g n n 4 8 (1993) 1 69

    evidence of sensitivity to the sub-regularities that characterize irregular forms; for

    example, dental-final forms are Less likely to be overregularized than the other

    subclasses of irregulars, especially early in development. Consistent with thisview, one might postulate that the discovery of sub-regularities in the initial verbs

    learned by children would help consolidate the learning of these verbs and

    contribute to maintaining the early period of error-free performance.

    As vocabulary size expands (and training continues), the tendency for thenetwork to add a suffix to indeterminate novel stems increases substantially (from

    2% to 92%). The most rapid increase occurs during the period of training when

    vocabulary size increases from 30 to 140 verbs. A major proportion (76%) of

    indeterminate novel stems are regularized by the 140 vocabulary mark. Note that

    this point in training corresponds to a period in which performance on trained

    irregular verbs also deteriorates (see Fig. 2). Thereafter, the rate of increase in

    regularization of novel indeterminate stems is seen to decelerate.

    The relatively sudden onset of the systematic treatment of novel stems suggeststhat abrupt reorganizational processes are occurring in the weight matrix of the

    network. However, it is unclear whether these changes are the result of prolonged

    training or the network’s exposure to an increasing number of different stems as

    vocabulary expansion continues. In order to tease apart these two factors, we

    trained the network on different levels of final vocabulary size. In these test

    simulations, training proceeded in precisely the same fashion as in previous

    simulations except that vocabulary expansion was halted at six different levels -30, 40, 50, 60, 70 or 80 verbs. After vocabulary expansion was stopped, training

    ig u r e 5.


    .s 4 )- z - g

    $ 30 - -

    8 - 20 -

    a I

    o~:,,,,,,,,,,,,,,,,,,,,,[100 200 300 400 500


    G en e r z n f nd e e r m n e s em s b y v c b u r y s z e. S x s e s f s m u n s r e p ed .E ch s e c rr es p nd s different f n v c b u r y s z e (30, 40, S , 60, 70 r 8 0 v e r b s ). n e ch s e , h ev er , r n n g c n nu es f r 00 e p ch s . W e gh m r ce s r e s v ed d ff er en

    p n s n r n n g nd r eg u r z n f nd e er m n e n v e s em s ev u ed .

  • 8/20/2019 Plunkett&Marchman 1993


    K . un k e , V r ch m n g n n 4 8 (1993) 1 69 3

    continued until the 500-epoch mark. Figure 5 plots the frequency with which

    indeterminate novel stems were mapped as regular past tense forms in the

    different simulations.

    These generalization curves indicate that final vocabulary size, rather