Some Myths of Connectionism

István S. N. Berkeley
(c) Copyright, 1997. All rights reserved.

Philosophy,

The University of Southwestern Louisiana

Introduction

Since the emergence of what Fodor and Pylyshyn (1988) call 'new connectionism', there can be little doubt that connectionist research has become a significant topic for discussion in the Philosophy of Cognitive Science and the Philosophy of Mind. In addition to the numerous papers on the topic in philosophical journals, almost every recent book in these areas contain at least a brief reference to, or discussion of, the issues raised by connectionist research (see Sterelny 1990, Searle, 1992, and O Nualláin, 1995, for example). Other texts have focused almost exclusively upon connectionist issues (see Clark, 1993, Bechtel and Abrahamsen, 1991 and Lloyd, 1989, for example). Regrettably the discussions of connectionism found in the philosophical literature suffer from a number of deficiencies. My purpose in this paper is to highlight one particular problem and attempt to take a few steps to remedy the situation.

The problem I will be concerned with here arises from what might be called (for want of a better term) the 'Myths of Connectionism'. These myths are often repeated claims that have been made about connectionist systems, which when closely scrutinized, fail to be adequately justified, or properly qualified. In some instances, such claims are simply false. Before discussing these claims directly though, a word of caution is in order. It is important that it is clear that the intended audience here is not the technical community of active connectionist researchers. Members of the technical community are usually acutely aware of the limitations and shortcomings of their systems. The purpose here is rather to address the strictly philosophical community who are interested in the issues raised by the research results from connectionist systems. The reason for this focus is that it is in this latter community that the myths of connectionism seem to have their strongest hold.

Before proceeding to a discussion of particular myths, one further remark is in order. It is not the case that every mythical claim in the philosophical literature is made in exactly the same way. Rather, the particular myths which I will discuss represent a class of claims which bear a certain 'family resemblance' to one another. My versions of the mythical claims are designed to get at the core of the class of claims. I leave it up to the reader to determine the applicability of the arguments I put forward to particular instances of mythical claims.

Myth 1: "Connectionist systems are in some sense 'neural' or 'brain-like'"

One important series of related claims made about the connectionist systems is that, in a significant sense, they are brain-like, or have brain-like properties. Clark (1989: p. 4), for example, talks of "...the brain-like structure of connectionist architectures." Similarly, Bechtel and Abrahamsen (1991: p. 17), in describing the rise of the new connectionism, claim that "...network models were attractive because they provided a neural-like architecture for cognitive modeling". Perhaps the most explicit endorsement of this myth though, is due to Paul Churchland. Churchland (1989: p. 160) introduces connectionist networks as follows,

The networks to be explored attempt to simulate natural neurons with artificial units...Each unit receives input signals from other units via "synaptic" connections...the "axonal" end branches from other units all make connections directly to the "cell body" of the receiving unit.

Churchland makes similar claims elsewhere too (see Churchland 1988: p. 156). Even Dennett (1991: p. 239) makes reference to "...'connectionist' architectures of neuron-like elements...". However, it is not just those who favor the connectionist approach who make this kind of claim. For example Sterelny (1990: p. 175) and Cummins (1989: p. 155), though neither of them are great fans of connectionism, both appeal to this myth. Given these examples, it is clear that the claim that connectionist systems are 'brain-like' or have 'neuron-like' properties is well established in the philosophical literature.

The source of this myth in the philosophical literature is not hard to determine, as it is common to find fundamentally the same claims made in the technical literature as well (see for example, McClelland, Rumelhart and Hinton (1986), or Rumelhart (1989)). However, in the technical literature, there is also often an open acknowledgment that there are radical dissimilarities between the biological systems and connectionist ones (see for example Crick and Asanuma 1986). Unfortunately, these salutary remarks seldom find their way into the philosophical literature.

There are, in fact, two particular related claims which are frequently confused in the philosophical literature. These are,

(a) Connectionist systems are biologically plausible/brain-like, and
(b) Connectionist systems are more biologically plausible/brain-like than other non-connectionist architectures.

Although there may be some credibility to the second claim (b), it is the first claim (a) which is, unequivocally, a myth. A careful comparison of the various components of a connectionist system with the supposedly analogous components of the brain shows that there is only the most minimal similarity between the biological and connectionist systems. As, to some degree, claim (b) rests upon claim (a), these facts cast some doubt on the plausibility of this claim too.

Processing Units

Let us begin by examining the claim that a connectionist processing unit is in some sense similar to a biological neuron. For example, Rumelhart (1989: p. 134) has claimed that a "...[connectionist ] processing unit [is] something close to an abstract neuron." This claim should arouse immediate suspicion, given the fact that, as Winlow (1990a: p. 1) notes "It has always been very clear to neuroscientists that there is no such thing as a typical neurone (sic),...". There are, as a matter of fact many different types of neurons (see Kolb and Whishaw (1990: p. 5) and deGroot and Chusid (1988: p. 5) for illustrations of some of these types). Indeed, according to Churchland and Sejnowski (1994: pp. 43) there are twelve different kinds of neurons in the neocortex alone. Given these facts, it seems reasonable to ask, just which kind of neuron connectionist processing units are an abstraction from? Connectionists though, as a rule, have little if anything to say on this matter.

If the 'abstract neurons' employed within connectionist systems are supposed to capture the significant features of the class of all neurons, then it is reasonable to ask how the set of features selected were decided upon. Regrettably though, the selection of features and functional properties employed in 'abstract neurons' has yet to be justified or defended in any detail. Thus, until some better account of the relationship between connectionist processing units and actual biological neurons is forthcoming, it seems reasonable to treat this claim about processing units with some skepticism. A bold unsubstantiated claim will not suffice, where argument is required.

A related concern derives from the fact that many, if not most, connectionist systems involve homogeneous processing units.1 This homogeneity does not reflect the complexity of the biological situation. Getting (1989: p. 187) remarks that "No longer can [biological] neural networks be viewed as the interconnection of many like elements....". In fact, Churchland and Sejnowski (1994: p. 51) claim that, in the brain "[m]ost connections are between, not within, cell classes." If connectionist networks were to have genuine biological or neural plausibility, it is reasonable to expect them to reflect these facts about biological systems. The discrepancy between the state of affairs in connectionist networks as compared to biological neural networks merely serves to further undermine the tenability of the claim that connectionist systems are biologically plausible.

Finally, it is common practice for connectionist models (which undergo training employing a learning rule) to have what is known as a 'bias' term, which is trained at the same time as the connection weights are trained. However, there is little or no evidence that threshold membrane potentials (the most natural biological equivalents of bias) in biological systems can be modified in any analogous way. In natural neurons, there is no evidence that the thresholds of neurons exhibit any plasticity at all. This shows that connectionists, who's systems involve trainable biases, standardly take it upon themselves to add an extra degree of freedom into their networks. However, this degree of freedom lacks any biological justification. Again, this counts against the biological plausibility of such systems.

Connections

In biological nervous systems, neurons have two components which are roughly equivalent to the weighted connections between processing units in a connectionist network. These components are axons and dendrites. Dendrites receive signals passing to neurons and axons send signals from particular neurons to others. One immediate (though relatively trivial) difference between connectionist systems and biological ones is that, in biological systems, axons and dendrites are components of neurons themselves, whereas in connectionist systems the connections between units are distinct from the units themselves. This however, is not the only difference.

It is standard practice for connectionists to make their networks 'massively parallel'. That is to say, each unit of a particular layer is normally arranged so that it has connections to every unit of both prior and subsequent layers in the network (See figure 1).

Figure 1

A typical connectionist network, with massively parallel connections between layers of processing units.

However, there are no results which suggest that this is the situation in biological systems (see Dawson and Shamanski 1993). Indeed, what evidence there is suggests that this is not the case. Churchland and Sejnowski (1994: p. 51) whilst discussing the patterns of connectivity found in the brain cortex, note that,

Not everything is connected to everything else. Each cortical neuron is connected to a roughly constant number of neurons, irrespective of brain size, namely about 3% of the neurons underlying the surrounding square millimeter of cortex. Hence,...cortical neurons are actually rather sparsely connected...

Unfortunately, standard connectionist practice pays no heed to this particular fact about biological neural systems.2 Moreover, given the importance of connections to overall functioning in both natural neural systems and connectionist networks, this difference is far from trivial. Furthermore, it is also the case that in standard small connectionist networks individual units from one layer can have a significant impact on the activation level of particular units at the next layer. In biological systems though, the influence of one neuron upon the state of another is, in most cases (there are important exceptions), relatively weak. Usually, the influence of one neurons activity upon another is in the order of 1%-5% of the firing threshold (see Churchland and Sejnowski 1994: p. 52). In the connectionist literature, no attention is paid to this particular subtlety.

Another sharp discrepancy which exists between standard connectionist models and biological systems is in their differing ways of transmitting signals between units or neurons. In connectionist networks, the signals which are sent via the weighted connections take the form of continuous numerical values. But in real neurological systems, signals are sent in the form of spiked pulses of signal (for an illustration of this, see Churchland and Sejnowski (1994: p. 53)). This would not be a decisive objection against connectionist models, were it to be the case that continuous values could capture the essential properties of the signals transmitted by the spiked pulses. However, this is not the case. Firstly, different types of neurons have different firing patterns. Secondly, some neurons firing patterns are a function of their recent firing history. Thirdly, some neurons have oscillatory firing patterns. Fourthly, most neurons spike randomly, even in the absence of input (Churchland and Sejnowski 1994: pp. 52-52) Finally, it is also the case that signals between neurons in biological systems are sent by more than one medium. Synaptic transmission occurs by both electrical and chemical means (Getting 1989: p. 191). Although it may be possible to capture at least some aspects of these complexities with the continuous values standardly employed in connectionist networks, there is no reason to believe that this is entirely the case without an argument to this effect. Connectionists have yet to come up with such an argument. Indeed, there seem to be good grounds to believe that the properties just mentioned will be highly significant to the functioning of actual neural systems. This being the case, there seem to be good grounds for doubting the putative biological plausibility of connectionist networks and the associated claims that such networks have neural-like properties.

In most connectionist networks, the relationship between the signal sent down a connection and the influence of that connection (with the associated weighting) upon the receiving unit is fairly straightforward. This is not the case in real neural systems though. Dreyfus (1992: pp. 161-162) briefly describes work by Lettvin which suggests that axon branches may serve to act as "low pass filters with different cutoff frequencies", with the precise frequency being dependent upon the physical diameter of the actual axon branch. Given this fact, there will be a complex and functionally significant relationship between the frequency and pattern of neuronal firing, and the length and diameter of the connections between neurons. This relationship will be functionally significant, as it will have a direct effect upon the influence of one neuron upon another. However, there is nothing in standardly described connectionist systems which is even remotely similar to such a mechanism. This being the case, there must be at least some functionally significant properties of biological systems which are not captured in connectionist systems. This, again, mitigates against the tenability of connectionist claims to biological plausibility.

Hopefully the facts from neuroscience cited above are sufficient to show that connectionist claims to biological plausibility are not as straightforward as many of the proponents of the myth would have us believe. Indeed, there are significant functional differences between connectionist systems and biological ones. Given these facts, it seems reasonable to conclude that the claim that connectionist systems are biologically plausible, at the current time at least, is in large part a myth.

As noted above though, even if claim (a), that connectionist systems are biologically plausible, or neuron-like is not tenable, there is still the weaker claim (b), to the effect that connectionist systems are more biologically plausible/brain-like than other non-connectionist architectures.

There is perhaps more plausibility to the weaker claim, although it too has problematic aspects (for example, it is far from clear what the appropriate metric should be for assessing comparative biological plausibility). However, a prima facia case for the plausibility of the weaker claim can be made.

Consider the case of two computational systems which both model some cognitive capacity. Let us suppose further that one system is connectionist and the other is a production system (production systems are a fairly typical non-connectionist architecture). If for some reason (perhaps a desire to develop a system which was strongly equivalent to human beings) we wished to try to make each system more biologically plausible, how would we fair?

In the case of the connectionist system, there are a number of steps which might be taken. These range from substituting non-homogeneous processing units with activation functions similar to the biological neurons of the relevant type, to utilizing more complex mechanisms to mediate the transmission of signal between units. Shastri and Ajjanagadde (1993), for example, have described a system which mimics in a rudimentary manner, the spiking of neural firing. How, on the other hand might we go about making a production system more biological? There does not seem to be any straightforward manner of doing this. Adding more productions is very unlikely to do the trick! So, in theory, connectionist systems could be made more biologically plausible than their non-connectionist cousins. This, though, is not the same as the claim that connectionist systems actually are more biologically plausible at the current time. Once again, this claim, if made in the present tense (C.f. the remark made by McClelland, Rumelhart and Hinton 1986: p. 12), is little more than a myth.4

Myth 2: 'Connectionist Systems Are Consistent With Real Time Constraints Upon Processing'

There is another claim which is sometimes made on behalf of connectionist systems, which is based upon comparing them with biological cognitive entities. This claim too has a significant mythological component. This claim has also found its way into the philosophical literature (see Bechtel 1985, Sterelny 1991: pp. 172-173, or Clark, 1991: pp. 119-122, for examples).

One of the astonishing things about biological cognitive systems is the speed at which they are able to perform tasks which (apparently) require many complex calculations. Somehow or other, the neurological components of humans and animals are able to successfully perceive the world, remember things and so on, despite the fact that individual neurological components (such as neurons) operate slowly, when compared, to the speed of a modern microprocessor. These facts have lead some connectionists (for example, Feldman and Ballard 1982, Rumelhart 1989, and Shastri 1991) to argue for the adoption of their approach. Such arguments frequently appeal to the problems which can arise with traditional symbolic systems, with respect to real time constraints upon processing.

One of the best known versions of this type of argument is the so called "100 step" argument (This nomenclature originates from Feldman and Ballard 1982). It is argued that, from what is known about the speed of firing of neurons in the brain, many basic human cognitive capacities (those which take under a second for humans to process in real time) cannot involve more than about one hundred processing steps. This is because actual neurons cannot go through more than about one hundred states in under a second. As standard non-connectionist architectures are for the most part inherently serial in nature and usually require considerably in excess of one hundred steps of processing for most operations, connectionists argue that they cannot provide a good model of actual cognitive function.

Rumelhart's (1989: p. 135) version of the argument goes like this;

Neurons operate in the time scale of milliseconds, whereas computer components operate in the time scale of nanoseconds--a factor of 106 faster. This means that human processes that take on the order of a second or less can involve only a hundred or so time steps.

Rumelhart then goes on to list several processes which occur in a second or so. The processes listed are all significant for the study of cognition and include linguistic capacities, perception and memory retrieval. The claim is that the facts about the speed of operation of neurons means that realistic computational accounts of these cognitive processes must either involve less than one hundred or so operations, or some account must be given for how it is that more than one hundred operations can occur in less than a second.

Rumelhart (1989: p. 135) believes that the correct way to explain phenomena of this type is a follows:

Given that the processes we seek to characterize are often quite complex and may involve consideration of large numbers of simultaneous constraints, our algorithms must involve considerable parallelism....Although the brain has slow components, it has very many of them....Rather than organize computation with many, many serial steps, as we do with systems whose steps are very fast, the brain must deploy many, many processing elements cooperatively and in parallel to carry out its activities.

Devices such as a Turing machines or von Neumann machines (usually) have a single processor which performs operations one at a time, one after another. This is often called 'serial' processing. One of the features of connectionist systems, by contrast, is that they are constructed from many simple processing devices which operate at the same time as one another. This is often referred to as 'parallel' processing. The parallel nature of connectionist systems means that they can (theoretically) perform many operations within each time step and thus, it is claimed, they do not (necessarily) violate the 100 step constraint.

The argument sketched above is pretty clearly, just a variant of the biological plausibility claims discussed earlier. This is because the arguments plausibility depends crucially upon an assumption that there is some functionally significant similarity between connectionist processing units and biological neurons (this point is noted by Fodor and Pylyshyn 1988: p. 55). As we have seen in the previous section though, the claim that connectionist units are neuron-like is largely a myth. This is not the only reason why the claim that connection systems are consistent with real time constraints upon processing is dubious, however.

The entire 100 step argument turns upon the premise that the individual neuron is the computationally significant level, as far as speed constraints go, in the brain. Should it turn out that there is significant processing which occurs at the sub-neuronal level (for example, at the level of synaptic clefts, see Kolb and Whishaw 1990: pp. 46-47), then this argument would loose much of its plausibility. The additional processing steps which standard architectures seem to require may be being done at this sub-neuronal level. In addition, in the brain, there are many chemical processes of the dendrites which take place over a wide range of time scales (Fodor and Pylyshyn, 1988: p. 55, n31). This being the case, there are grounds for wondering why advocates of the 100 step argument choose the rate of neuronal firing as the relevant time scale for their argument. There is no defense of this choice in the connectionist literature.

As a matter of fact, what is known about the behavior of biological systems tends to make the appeal to the speed of neurons implausible. In particular, the claim that "neurons operate in the time scale of milliseconds..." (Rumelhart 1989: p.135), which is crucial to the 100 step argument, involves a considerable oversimplification of the neurological facts. For example, cortical neurons have variety of different intrinsic firing patterns and rates (see Churchland and Sejnowski 1994: p. 53). It is also the case that the rate of firing of a particular neuron will be determined, in part, by the kind of nerve fiber which makes connections with it. deGroot and Chusid (1988: pp. 23-24) describe three distinct types of nerve fiber which have differential rates of signal conductance. Given these complexities, The simple temporal claim which is central to the 100 step argument, lacks plausibility without being defended in detail. Once again though, such a defense has not been attempted within the connectionist literature.

Even if these difficulties with the 100 step argument are overlooked, the argument still fails to unambiguously establish the conclusion its connectionist proponents propose. As Sterelny (1990: p. 172) notes, there are two possible conclusions from the 100 step argument. The weaker conclusion is that, however human brains actually work, they do not run the same programs as computers do. Of course, this conclusion is almost certainly (though somewhat trivially) correct. The stronger conclusion is that the 100 step argument shows that a certain class of theories about cognition are fundamentally incorrect. The stronger conclusion is presumably the one which connectionists wish to endorse. The stronger conclusion is highly problematic, however.

The strong conclusion of the 100 step argument should persuade us that explanations of cognitive phenomena which are rooted in serial processing are defective. However, this alone is not sufficient to justify the adoption of a connectionist approach. Connectionism does not have the corner on the market, when it comes to building parallel processing models. Sterelny (1990: p. 172) mentions (although he does not give a reference) that "...some version of the 'Marcus parser', which models sentence comprehension by incorporating a Chomskian transformational grammar, use parallel processes." Similar points are made elsewhere, in both Pylyshyn (1984) and Fodor and Pylyshyn (1988).

Another difficulty with the strong conclusion is that it is far from clear at what level the excessive (i.e. those in excess of 100) number of steps is supposed to arise. Would a computer program which involved more than 100 function calls be deemed unacceptable? Are the '100 steps' supposed to be basic processor operations? Without a clearer notion of what is to count as a step, it is hard to tell how the 100 step constraint could even be met by a serial processing system!

Given the problems just raised, it is reasonable to conclude that the 100 step argument should not be taken as providing support for the claim that connectionist systems are consistent with real time constraints upon processing. This, at least as a general claim about connectionist systems, is just another connectionist myth.

Myth 3: 'Connectionist Systems Exhibit Graceful Degradation'

The connectionist myths discussed above have focused primarily upon the supposed similarities between connectionist systems and biological entities. There is however another species of myths which concentrates upon attempting to show that connectionist systems are in some way preferable to non-connectionist ones. These two types of myth are not totally distinct though. The claim about real time constraints discussed above for example, involves elements of both kinds of myth.

A cognitive system which has to interact with the real world is often faced with imperfect input data. For example, humans by and large are pretty good at reading one another's handwriting, even though handwriting usually looks very different from the block print upon which most people initially learn to read. Similarly, we are also pretty good at understanding what is being said to us by someone, even if the speaker has a heavy accent, or the context of utterance is such that part of the utterance is obscured by background noise. We still succeed in identifying everyday objects even when they are viewed under unusual lighting conditions, or when they are viewed from unfamiliar angles. This being the case, a desirable property of computational models of cognitive processes is that such models should also be able to deal with degenerate input. Ideally, when a system is faced with incomplete, corrupt or even inconsistent input, the system should be able to make intelligent guesses about what the input should have been and make appropriate responses accordingly. If one briefly glimpses out of the corner of ones eye a bear charging towards one, waiting for more information is not an especially helpful response! The ability to handle incomplete, inconsistent or otherwise imperfect input data is sometimes called 'graceful degradation' (See Clark 1991: p. 62, Sterelny 1991: p. 173).5

One advantage often claimed for connectionist systems over their traditional counterparts is that connectionist systems exhibit graceful degradation. Churchland (1988: p. 120) makes the point thus,

...you can recognize a photo of your best friend's face, in any of a wide range of poses, in less than half a second. But such a recognitional achievement still eludes the best pattern-recognition programs available,...

Churchland (1988: p. 120) goes on to note that "even strongly simplified recognitional problems" are very difficult indeed for non-connectionist systems. Similar claims can be found in McClelland, Rumelhart and Hinton (1986).

Now the mythological component here derives not so much from the facts, so much as the conclusion which is drawn from these facts. From the fact that many non-connectionist systems, at the present time, do not exhibit graceful degradation, it does not follow that they cannot be made to exhibit this property. The facts amount to nothing more persuasive than prima facia evidence. The facts certainly cannot be used to support the more general conclusion that connectionist systems are preferable to non-connectionist ones. There are a (at least) two reasons why this is the case. First, non-connectionist systems which exhibit graceful degradation to the same degree and under the same circumstances that connectionist systems apparently do, may be developed at any time.6 Indeed, this is an area of active research. For example, systems known as 'Truth Maintenance Systems' (See Forbus and de Kleer 1992) have been developed which are able to reason effectively on the basis of incomplete information. Second, just because one type of system is apparently superior to another type with respect to one set of properties, does not mean that such a system is superior with respect to all relevant properties. It may well be the case that there are difficulties with connectionist systems which other systems can easily over come (see for example the claims made by Fodor and Pylyshyn, 1988, with respect to systematicity and compositionality).

Connectionists who argue in favor of their approach on the basis of the graceful degradation of their systems over look these considerations. The upshot of this is that the putative superiority of connectionist systems over other systems is not established by the simple appeal to one or two apparent properties of these systems. In order for such arguments to be persuasive, it would be necessary to consider all the relevant properties. Consequently, although the factual claims about the graceful degradation of connectionist systems may, at the current time, suggest that such systems may have advantages over non-connectionist systems for certain types of tasks, graceful degradation alone is not sufficient to support the more general conclusion that connectionist systems are superior to these systems. It follows from this that claimed superiority of connectionist systems, based solely upon an appeal to graceful degradation, is nothing more than a myth.

There are a variety of other allegedly desirable properties which connectionist systems are claimed to have, which alternative systems do not. These include being resistant to damage, being good pattern recognizers, being good at retrieving information on the basis of the content of the information and being able to handle multiple constraint satisfaction problems (See McClelland, Rumelhart and Hinton 1986, for example). All these claims however, fail to adequately support the conclusion that connectionist systems are intrinsically superior to non-connectionist ones, for reasons very similar to those described above for graceful degradation. For this reason, I will not go through the arguments here. The important point here is that general claims about the superiority of connectionist systems over alternative architectures, which are made on the basis of connectionist systems apparently having some desirable property which other systems apparently lack, are (generally speaking) not adequately supported. This being the case, such claims may constitute nothing more than connectionist myths.

Myth 4: 'Connectionist Systems Are Good Generalizers'

The claim that connectionist networks exhibit graceful degradation is sometimes made in conjunction with a claim that networks are good at 'generalization' (See for example, McClelland, Rumelhart and Hinton 1986: pp. 29-30). Once again this is a claim which has become included in the philosophical literature on connectionism (Bechtel and Abrahamsen, 1991; p. 31 passim, Churchland, 1988, p. 161) with little or no qualification. This claim too is also problematic.

As is the case with the graceful degradation claim, a commonly implied conclusion from the generalization claim is that connectionist systems are to be preferred to competing systems as models of cognitive function. This claim though is a little more interesting than the graceful degradation claim (and related claims), as such it deserves special treatment (this is not to say that the objections outlined above may also apply to this claim). However, like the connectionist claims above, the generalization claim has a mythological component.

As a rough first approximation, a system can be said to generalize when it can produce outputs which are appropriate for a particular input or class of inputs, which it has not been previously given information about. The first difficulty with the claim about the generalization of connectionist systems is that it is often the case that generalization is specified in a manner which is only appropriate for connectionist systems (or some sub-set of connectionist systems). For example, Clark (1993: p. 21) describes generalization thus,

A net is said to generalize if it can treat novel cases sensibly, courtesy of its past training.

This notion of generalization is inordinately narrow though. For example, it would not be applicable to systems which do not undergo training. The famous Jets and Sharks network (described in McClelland, Rumelhart and Hinton 1987: pp. 26-31, McClelland and Rumelhart 1988: pp. 38-46, Clark 1991: pp. 86-92, and Bechtel and Abrahamsen 1991: pp. 21-34) is said to exhibit 'generalization' (albeit, not very good generalization, in this case), yet does not undergo training.7 If generalization is specified broadly though (for example, as I do with the 'rough, first approximation' above), then many manifestly non-connectionist systems seem to exhibit generalization too. For example, Rip's (1983) ANDS system, which is a paradigm example of a non-connectionist system, might plausibly be said to generalize in this sense. This being the case, an appeal to generalization cannot adequately support the contention that connectionist systems are preferable to competing architectural approaches.

It is also the case that, even if a narrow conception of generalization such as Clark's is employed, only some connectionist networks exhibit this property. A common difficulty encountered by network researchers is that, even with identical network architectures, training regimes and similar starting parameters, different versions of the same network will exhibit different degrees of generalization, due to the practice of setting initial weight and biases values randomly. Sometimes, if a network has too many hidden units and is trained to too strict a convergence criterion, a network may simply instantiate a 'look-up table' for the training set, and produce responses upon generalization testing which are equal to, or worse than mere chance! Generalization (no matter which conception is employed) is a property of only some networks, and not a general property of all networks.

An additional complication which arises with the claim about generalization is that it is very sensitive to the particular task being considered. This fact is frequently not made explicit in the descriptions of generalization in the connectionist literature though. This is very nicely exemplified by another example from Clark. Clark (1993: p. 21) briefly describes a connectionist network, originally due to McClelland and Rumelhart (1986, Vol. 2: pp. 170-215), which was trained to recognize dogs and was trained upon sets of dog features which were supposed to correspond to the features of individual dogs. Clark (1993: p. 21) cites as an example of generalization (in accordance with the conception quoted above) the fact that,

...a novel instance of a dog (say, one with three legs) will still be expected to bark so long as it shares enough of the doggy central tendencies to activate the knowledge about prototypical dogs.

Although the facts are correct, this example does not do justice to the influence of the chosen task domain upon generalization.

Suppose that the network was trained not only to recognize dogs, but was also trained to recognize common items of furniture. Now, if the only three legged object in the entire training set were to be a small stool, it is quite possible that the network would classify the three legged dog as a non-barking object (i.e. like a stool), rather than as a barking one. The performance of such a network would be dependent upon the ratio of dogs to furniture in the entire training set, as well as various other specific details of the training regime. 9

Given all the difficulties I have outlined above, it is not unreasonable to conclude that the claim that connectionist systems are good generalizers is so problematic, that in many instances, it may amount to nothing but a myth. It is far from clear that there is even a uniform notion of generalization which is used amongst connectionists, let alone a conception which is common and applicable to both connectionism and non-connectionist systems.10 Without detailed clarification of the notion, it is not a suitable basis for comparison at all. Moreover, the actual evidence for the claim that networks are good generalizers is far from unequivocal. Thus, claims about generalization cannot provide an adequate basis to adjudicate between connectionist and other systems, or be used as a premise in arguments for other conclusions. Furthermore, although some connectionist systems may exhibit something which might plausibly be termed 'generalization', it is certainly not universally the case. Thus, claims about connectionist systems and generalization which are not very carefully hedged (as they almost never are) are going to be false of many connectionist systems. Hence, the unqualified claim that connectionist systems are good generalizers is largely a myth.

Conclusion

It should be apparent from the discussion above that a number of the claims which have been made about connectionist systems in the philosophical literature, and elsewhere, are not as straightforward as they may initially appear. Without appropriate qualification, the claims which have been termed 'myths' here have a great potential to mislead. Whilst some of the mythical claims have a basis in fact, the uncritical deployment of these claims as premises to arguments is highly likely to arguments which are valid, but not sound. For this reason, it is important that members of the philosophical community who think about the issues raised by connectionism, do so in a manner which is more careful than has been the common practice previously.

There is also a moral here for connectionist researchers. It may be the case that some of the claims which have here been labeled 'myths' can be substantiated. However, substantiation requires persuasive argumentation. So, connectionist might think of eschewing the lab for a little while and spending some time constructing arguments to support those claims which they consider to be axiomatic to their research program. This would then assist philosophers in determining which of their claims, and under what circumstances, can be used as premises in theorizing about connectionist related matters.

It might be though that the arguments offered here point to a skeptical conclusion about connectionist research. However, such an impression would be misleading. There are, I believe, philosophically interesting and important conclusions which can be drawn on the basis of connectionist research (see for example, Dawson, Medler and Berkeley 1997). However, for those conclusions to become apparent, it is necessary to clear away the connectionist mythology, in order to get to the connectionist facts.

1 A notable exception to this general rule can be found in Dawson and Schopflocher (1992: pp. 25-26).

2 One further, though often overlooked, consequence of massive parallelism in connectionist systems is that modern connectionist networks usually violate Minsky and Papert's (1969) limited order constraint. This means that the criticisms of Minsky and Papert's conclusions, such as that offered by Bechtel and Abrahamsen (1991: p. 15), are not only misplaced, they are simply incorrect.

3 Not too much weight should be put on this system though - it is very far from being biologically plausible. See Dawson and Berkeley (1993).

4 For a further discussion of the claim that connectionist networks have some kind of biological plausibility, see Quinlan (1991: pp. 240-244). Quinlan's assessment of the current state of the art is similar to the one offered here.

5 Actually, the notion of 'graceful degradation' is somewhat more technical than this. Clark's (1991) characterization will suffice for current purposes though.

6 This objection is also raised by Sterelny (1990: pp. 173-175).

7 In fact, Clark (1991: p. 92) even makes a claim about the generalization abilities of the Jets and Sharks network.

8 Actually, the situation is somewhat more complex than this, in so much as it is unclear whether or not the inference rules within ANDS are to count as containing information about every inference of a particular syntactic type. This complication does not effect my main point though, so I will not discuss it further here.

9 Cf. the example of a network for assessing bank loan applications, discussed by Clark (1993: p. 71).

10 Indeed, the considerations discussed above might be taken as being indicative that the term 'generalization' exhibits what Waismann (1951) calls 'open texture'.

Bibliography

Barnden, J. and Pollack, J. (Eds.), (1991), Advances in Connectionist and Neural Computation Theory (Vol. 1): High-Level Connectionist Models, Ablex Pub. Co. (Northwood, NJ).

Bechtel, W. (1985), "Contemporary Connectionism: Are The New Parallel Distributed Processing Models Cognitive Or Associationist?" in Behaviourism, 13/1, pp. 53-61.

Bechtel, W. and Abrahamsen, A. (1991) Connectionism and the Mind, Basil Blackwell (Cambridge, Mass.).

Churchland, P. M. (1988), Matter and Consciousness: A Contemporary Introduction to the Philosophy of Mind, MIT Press (Cambridge, Mass.).

Churchland, P. M. (1989), The Neurocomputational Perspective: The Nature of Mind and the Structure of Science, MIT Press (Cambridge, Mass.).

Churchland, P. S. and Sejnowski, T. (1994), The Computational Brain, MIT Press (Cambridge, Mass.).

Clark, A. (1989), Microcognition: Philosophy, Cognitive Science and Parallel Distributed Processing, MIT Press (Cambridge, Mass.).

Clark, A. (1993), Associative Engines: Connectionism, Concepts, and Representational Change, MIT Press (Cambridge, Mass.).

Crick, F. and Asanuna, C. (1986), "Certain Aspects of the Anatomy and Physiology of the Cerebral Cortex", in Rumelhart, McClelland et al. (1986: Vol. 2., pp. 333-371).

Cummins, R. (1989), Meaning and Mental Representation, MIT Press (Cambridge, Mass.).

Dawson, M. and Berkeley, I. (1993), "Making a Middling Mousetrap", commentary on Shastri, L. and Ajjanagadde, V. (1993) ""From Simple Associations to Systematic Reasoning", in Behavior and Brain Sciences, Vol. 16, No. 3, pp. 454-455.

Dawson, M., Medler, D. and Berkeley, I. (1997) "PDP Networks Can Provide Models That Are Not Mere Implementations of Classical Theories" in Philosophical Psychology, forthcoming.

Dawson, M. and Schopflocher, D. (1992), "Modifying the Generalized Delta Rule to Train Networks of Non-monotonic Processors for Pattern Classification", in Connection Science, 4/1, pp. 19-31.

deGroot, J. and Chusid, J. (1988), Correlative Neuroanatomy, (12th Ed), Appleton and Lange (Connecticut).

Dennett, D. (1991), Consciousness Explained, Little, Brown and Co. (Boston).

Dreyfus, H. (1992), What Computers Still Can't Do: A Critique of Artificial Reason, MIT Press (Cambridge, MA).

Feldman, J. and Ballard, D. (1982), "Connectionist Models and Their Properties", in Cognitive Science, 6, pp. 205-254.

Fodor, J. and Pylyshyn, Z. (1988), "Connectionism and Cognitive Architecture: A Critical Analysis", in Cognition 28, pp. 3-71.

Forbus, K. and de Kleer, J. (1992), Building Problem Solvers, MIT Press (Cambridge, Mass.).

Getting, P. (1989), "Emerging Principles Governing the Operation of Neural Networks", in Annual Review of Neuroscience, 12, pp. 184-204.

Kolb, B. and Whishaw, I. (1990), Fundamentals of Human Neuropsychology (3rd Ed.), Freeman and Co. (New York).

Lloyd, D. (1989), Simple Minds, MIT Press (Cambridge, Mass.).

McClelland, J. and Rumelhart, D. (1986), "A Distributed Model of Human Learning and Memory", in Rumelhart, McClelland, (1986: 2/ pp. 170-215).

McClelland, J., Rumelhart, D. and Hinton, G. (1986), "The Appeal of Parallel Distributed Processing", in Rumelhart, McClelland, et al. (1986: Vol. 1, pp. 3-44).

O Nualláin, S. (1995), The Search for Mind: A New Foundation for Cognitive Science, Ablex Pub. Co. (Norwood, NJ).

Posner, M. (1989), Foundations of Cognitive Science, MIT Press (Cambridge, Mass.).

Pylyshyn, Z. (1984), Computation and Cognition, MIT Press (Cambridge, Mass.).

Quinlan, P. (1991), Connectionism and Psychology: A Psychological Perspective on New Connectionist Research, U. of Chicago Press (Chicago, IL).

Rips, L. J. (1983), "Cognitive Processes in Propositional Reasoning" in Psychological Review, 90/1, pp. 38-71.

Rumelhart, D. (1989), "The Architecture of Mind: A Connectionist Approach", in Posner (1989: pp. 133-159).

Rumelhart, D., McClelland, J. and The PDP Research Group (1986), Parallel Distributed Processing: Explorations in the Microstructure of Cognition, (2 Vols.), MIT Press (Cambridge, Mass.).

Searle, J. (1992), The Rediscovery of The Mind, MIT Press (Cambridge, Mass.).

Shastri, L. (1991), "The Relevance of Connectionism to AI: A Representation and Reasoning Perspective", in Barnden and Pollack (1991: pp. 259-283).

Shastri, L. and Ajjanagadde, V. (1993) ""From Simple Associations to Systematic Reasoning", in Behavioral and Brain Sciences, Vol. 16, No. 3 (Sept. 1993).

Sterelny, K. (1990), The Representational Theory of Mind, Blackwell (Oxford).

Winlow, W. (1990a), "Prologue; The 'typical' neurone", in Winlow (1990: pp. 1-4).

Winlow, W. (Ed.), (1990), Neuronal Communications, Manchester U. P. (Manchester).