Introduction
Since the emergence of what Fodor and Pylyshyn (1988) call 'new
connectionism', there can be little doubt that connectionist research
has become a significant topic for discussion in the Philosophy
of Cognitive Science and the Philosophy of Mind. In addition to
the numerous papers on the topic in philosophical journals, almost
every recent book in these areas contain at least a brief reference
to, or discussion of, the issues raised by connectionist research
(see Sterelny 1990, Searle, 1992, and O Nualláin, 1995,
for example). Other texts have focused almost exclusively upon
connectionist issues (see Clark, 1993, Bechtel and Abrahamsen,
1991 and Lloyd, 1989, for example). Regrettably the discussions
of connectionism found in the philosophical literature suffer
from a number of deficiencies. My purpose in this paper is to
highlight one particular problem and attempt to take a few steps
to remedy the situation.
The problem I will be concerned with here arises from what might
be called (for want of a better term) the 'Myths of Connectionism'.
These myths are often repeated claims that have been made about
connectionist systems, which when closely scrutinized, fail to
be adequately justified, or properly qualified. In some instances,
such claims are simply false. Before discussing these claims directly
though, a word of caution is in order. It is important that it
is clear that the intended audience here is not the technical
community of active connectionist researchers. Members of the
technical community are usually acutely aware of the limitations
and shortcomings of their systems. The purpose here is rather
to address the strictly philosophical community who are interested
in the issues raised by the research results from connectionist
systems. The reason for this focus is that it is in this latter
community that the myths of connectionism seem to have their strongest
hold.
Before proceeding to a discussion of particular myths, one further
remark is in order. It is not the case that every mythical claim
in the philosophical literature is made in exactly the same way.
Rather, the particular myths which I will discuss represent a
class of claims which bear a certain 'family resemblance' to one
another. My versions of the mythical claims are designed to get
at the core of the class of claims. I leave it up to the reader
to determine the applicability of the arguments I put forward
to particular instances of mythical claims.
Myth 1: "Connectionist systems are in some sense 'neural' or 'brain-like'"
One important series of related claims made about the connectionist
systems is that, in a significant sense, they are brain-like,
or have brain-like properties. Clark (1989: p. 4), for example,
talks of "...the brain-like structure of connectionist architectures."
Similarly, Bechtel and Abrahamsen (1991: p. 17), in describing
the rise of the new connectionism, claim that "...network
models were attractive because they provided a neural-like architecture
for cognitive modeling". Perhaps the most explicit endorsement
of this myth though, is due to Paul Churchland. Churchland (1989:
p. 160) introduces connectionist networks as follows,
Churchland makes similar claims elsewhere too (see Churchland
1988: p. 156). Even Dennett (1991: p. 239) makes reference to
"...'connectionist' architectures of neuron-like elements...".
However, it is not just those who favor the connectionist approach
who make this kind of claim. For example Sterelny (1990: p. 175)
and Cummins (1989: p. 155), though neither of them are great fans
of connectionism, both appeal to this myth. Given these examples,
it is clear that the claim that connectionist systems are 'brain-like'
or have 'neuron-like' properties is well established in the philosophical
literature.
The source of this myth in the philosophical literature is not
hard to determine, as it is common to find fundamentally the same
claims made in the technical literature as well (see for example,
McClelland, Rumelhart and Hinton (1986), or Rumelhart (1989)).
However, in the technical literature, there is also often an open
acknowledgment that there are radical dissimilarities between
the biological systems and connectionist ones (see for example
Crick and Asanuma 1986). Unfortunately, these salutary remarks
seldom find their way into the philosophical literature.
There are, in fact, two particular related claims which are frequently
confused in the philosophical literature. These are,
Although there may be some credibility to the second claim (b),
it is the first claim (a) which is, unequivocally, a myth. A careful
comparison of the various components of a connectionist system
with the supposedly analogous components of the brain shows that
there is only the most minimal similarity between the biological
and connectionist systems. As, to some degree, claim (b) rests
upon claim (a), these facts cast some doubt on the plausibility
of this claim too.
Processing Units
Let us begin by examining the claim that a connectionist processing
unit is in some sense similar to a biological neuron. For example,
Rumelhart (1989: p. 134) has claimed that a "...[connectionist
] processing unit [is] something close to an abstract neuron."
This claim should arouse immediate suspicion, given the fact that,
as Winlow (1990a: p. 1) notes "It has always been very clear
to neuroscientists that there is no such thing as a typical neurone
(sic),...". There are, as a matter of fact many different
types of neurons (see Kolb and Whishaw (1990: p. 5) and deGroot
and Chusid (1988: p. 5) for illustrations of some of these types).
Indeed, according to Churchland and Sejnowski (1994: pp. 43) there
are twelve different kinds of neurons in the neocortex alone.
Given these facts, it seems reasonable to ask, just which kind
of neuron connectionist processing units are an abstraction from?
Connectionists though, as a rule, have little if anything to say
on this matter.
If the 'abstract neurons' employed within connectionist systems
are supposed to capture the significant features of the class
of all neurons, then it is reasonable to ask how the set
of features selected were decided upon. Regrettably though, the
selection of features and functional properties employed in 'abstract
neurons' has yet to be justified or defended in any detail. Thus,
until some better account of the relationship between connectionist
processing units and actual biological neurons is forthcoming,
it seems reasonable to treat this claim about processing units
with some skepticism. A bold unsubstantiated claim will not suffice,
where argument is required.
A related concern derives from the fact that many, if not most,
connectionist systems involve homogeneous processing units.1 This
homogeneity does not reflect the complexity of the biological
situation. Getting (1989: p. 187) remarks that "No longer
can [biological] neural networks be viewed as the interconnection
of many like elements....". In fact, Churchland and Sejnowski
(1994: p. 51) claim that, in the brain "[m]ost connections
are between, not within, cell classes." If connectionist
networks were to have genuine biological or neural plausibility,
it is reasonable to expect them to reflect these facts about biological
systems. The discrepancy between the state of affairs in connectionist
networks as compared to biological neural networks merely serves
to further undermine the tenability of the claim that connectionist
systems are biologically plausible.
Finally, it is common practice for connectionist models (which
undergo training employing a learning rule) to have what is known
as a 'bias' term, which is trained at the same time as the connection
weights are trained. However, there is little or no evidence that
threshold membrane potentials (the most natural biological equivalents
of bias) in biological systems can be modified in any analogous
way. In natural neurons, there is no evidence that the thresholds
of neurons exhibit any plasticity at all. This shows that connectionists,
who's systems involve trainable biases, standardly take it upon
themselves to add an extra degree of freedom into their networks.
However, this degree of freedom lacks any biological justification.
Again, this counts against the biological plausibility of such
systems.
Connections
In biological nervous systems, neurons have two components which
are roughly equivalent to the weighted connections between processing
units in a connectionist network. These components are axons and
dendrites. Dendrites receive signals passing to neurons and axons
send signals from particular neurons to others. One immediate
(though relatively trivial) difference between connectionist systems
and biological ones is that, in biological systems, axons and
dendrites are components of neurons themselves, whereas in connectionist
systems the connections between units are distinct from the units
themselves. This however, is not the only difference.
It is standard practice for connectionists to make their networks
'massively parallel'. That is to say, each unit of a particular
layer is normally arranged so that it has connections to every
unit of both prior and subsequent layers in the network (See figure
1).
However, there are no results which suggest that this is the situation
in biological systems (see Dawson and Shamanski 1993). Indeed,
what evidence there is suggests that this is not the case. Churchland
and Sejnowski (1994: p. 51) whilst discussing the patterns of
connectivity found in the brain cortex, note that,
Unfortunately, standard connectionist practice pays no heed to
this particular fact about biological neural systems.2 Moreover,
given the importance of connections to overall functioning in
both natural neural systems and connectionist networks, this difference
is far from trivial. Furthermore, it is also the case that in
standard small connectionist networks individual units from one
layer can have a significant impact on the activation level of
particular units at the next layer. In biological systems though,
the influence of one neuron upon the state of another is, in most
cases (there are important exceptions), relatively weak. Usually,
the influence of one neurons activity upon another is in the order
of 1%-5% of the firing threshold (see Churchland and Sejnowski
1994: p. 52). In the connectionist literature, no attention is
paid to this particular subtlety.
Another sharp discrepancy which exists between standard connectionist
models and biological systems is in their differing ways of transmitting
signals between units or neurons. In connectionist networks, the
signals which are sent via the weighted connections take the form
of continuous numerical values. But in real neurological systems,
signals are sent in the form of spiked pulses of signal (for an
illustration of this, see Churchland and Sejnowski (1994: p. 53)).
This would not be a decisive objection against connectionist models,
were it to be the case that continuous values could capture the
essential properties of the signals transmitted by the spiked
pulses. However, this is not the case. Firstly, different types
of neurons have different firing patterns. Secondly, some neurons
firing patterns are a function of their recent firing history.
Thirdly, some neurons have oscillatory firing patterns. Fourthly,
most neurons spike randomly, even in the absence of input (Churchland
and Sejnowski 1994: pp. 52-52) Finally, it is also the case that
signals between neurons in biological systems are sent by more
than one medium. Synaptic transmission occurs by both electrical
and chemical means (Getting 1989: p. 191). Although it may be
possible to capture at least some aspects of these complexities
with the continuous values standardly employed in connectionist
networks, there is no reason to believe that this is entirely
the case without an argument to this effect. Connectionists have
yet to come up with such an argument. Indeed, there seem to be
good grounds to believe that the properties just mentioned will
be highly significant to the functioning of actual neural systems.
This being the case, there seem to be good grounds for doubting
the putative biological plausibility of connectionist networks
and the associated claims that such networks have neural-like
properties.
In most connectionist networks, the relationship between the signal
sent down a connection and the influence of that connection (with
the associated weighting) upon the receiving unit is fairly straightforward.
This is not the case in real neural systems though. Dreyfus (1992:
pp. 161-162) briefly describes work by Lettvin which suggests
that axon branches may serve to act as "low pass filters
with different cutoff frequencies", with the precise frequency
being dependent upon the physical diameter of the actual axon
branch. Given this fact, there will be a complex and functionally
significant relationship between the frequency and pattern of
neuronal firing, and the length and diameter of the connections
between neurons. This relationship will be functionally significant,
as it will have a direct effect upon the influence of one neuron
upon another. However, there is nothing in standardly described
connectionist systems which is even remotely similar to such a
mechanism. This being the case, there must be at least some functionally
significant properties of biological systems which are not captured
in connectionist systems. This, again, mitigates against the tenability
of connectionist claims to biological plausibility.
Hopefully the facts from neuroscience cited above are sufficient
to show that connectionist claims to biological plausibility are
not as straightforward as many of the proponents of the myth would
have us believe. Indeed, there are significant functional differences
between connectionist systems and biological ones. Given these
facts, it seems reasonable to conclude that the claim that connectionist
systems are biologically plausible, at the current time at least,
is in large part a myth.
As noted above though, even if claim (a), that connectionist systems
are biologically plausible, or neuron-like is not tenable, there
is still the weaker claim (b), to the effect that connectionist
systems are more biologically plausible/brain-like than other
non-connectionist architectures.
There is perhaps more plausibility to the weaker claim, although
it too has problematic aspects (for example, it is far from clear
what the appropriate metric should be for assessing comparative
biological plausibility). However, a prima facia case for
the plausibility of the weaker claim can be made.
Consider the case of two computational systems which both model
some cognitive capacity. Let us suppose further that one system
is connectionist and the other is a production system (production
systems are a fairly typical non-connectionist architecture).
If for some reason (perhaps a desire to develop a system which
was strongly equivalent to human beings) we wished to try to make
each system more biologically plausible, how would we fair?
In the case of the connectionist system, there are a number of
steps which might be taken. These range from substituting non-homogeneous
processing units with activation functions similar to the biological
neurons of the relevant type, to utilizing more complex mechanisms
to mediate the transmission of signal between units. Shastri and
Ajjanagadde (1993), for example, have described a system which
mimics in a rudimentary manner, the spiking of neural firing.
How, on the other hand might we go about making a production system
more biological? There does not seem to be any straightforward
manner of doing this. Adding more productions is very unlikely
to do the trick! So, in theory, connectionist systems could
be made more biologically plausible than their non-connectionist
cousins. This, though, is not the same as the claim that connectionist
systems actually are more biologically plausible at the
current time. Once again, this claim, if made in the present tense
(C.f. the remark made by McClelland, Rumelhart and Hinton 1986:
p. 12), is little more than a myth.4
Myth 2: 'Connectionist Systems Are Consistent With Real Time Constraints Upon Processing'
There is another claim which is sometimes made on behalf of connectionist
systems, which is based upon comparing them with biological cognitive
entities. This claim too has a significant mythological component.
This claim has also found its way into the philosophical literature
(see Bechtel 1985, Sterelny 1991: pp. 172-173, or Clark, 1991:
pp. 119-122, for examples).
One of the astonishing things about biological cognitive systems
is the speed at which they are able to perform tasks which (apparently)
require many complex calculations. Somehow or other, the neurological
components of humans and animals are able to successfully perceive
the world, remember things and so on, despite the fact that individual
neurological components (such as neurons) operate slowly, when
compared, to the speed of a modern microprocessor. These facts
have lead some connectionists (for example, Feldman and Ballard
1982, Rumelhart 1989, and Shastri 1991) to argue for the adoption
of their approach. Such arguments frequently appeal to the problems
which can arise with traditional symbolic systems, with respect
to real time constraints upon processing.
One of the best known versions of this type of argument is the
so called "100 step" argument (This nomenclature originates
from Feldman and Ballard 1982). It is argued that, from what is
known about the speed of firing of neurons in the brain, many
basic human cognitive capacities (those which take under a second
for humans to process in real time) cannot involve more
than about one hundred processing steps. This is because actual
neurons cannot go through more than about one hundred states in
under a second. As standard non-connectionist architectures are
for the most part inherently serial in nature and usually require
considerably in excess of one hundred steps of processing for
most operations, connectionists argue that they cannot provide
a good model of actual cognitive function.
Rumelhart's (1989: p. 135) version of the argument goes like this;
Rumelhart then goes on to list several processes which occur in
a second or so. The processes listed are all significant for the
study of cognition and include linguistic capacities, perception
and memory retrieval. The claim is that the facts about the speed
of operation of neurons means that realistic computational accounts
of these cognitive processes must either involve less than one
hundred or so operations, or some account must be given for how
it is that more than one hundred operations can occur in less
than a second.
Rumelhart (1989: p. 135) believes that the correct way to explain
phenomena of this type is a follows:
Devices such as a Turing machines or von Neumann machines (usually)
have a single processor which performs operations one at a time,
one after another. This is often called 'serial' processing. One
of the features of connectionist systems, by contrast, is that
they are constructed from many simple processing devices which
operate at the same time as one another. This is often referred
to as 'parallel' processing. The parallel nature of connectionist
systems means that they can (theoretically) perform many operations
within each time step and thus, it is claimed, they do not (necessarily)
violate the 100 step constraint.
The argument sketched above is pretty clearly, just a variant
of the biological plausibility claims discussed earlier. This
is because the arguments plausibility depends crucially upon an
assumption that there is some functionally significant similarity
between connectionist processing units and biological neurons
(this point is noted by Fodor and Pylyshyn 1988: p. 55). As we
have seen in the previous section though, the claim that connectionist
units are neuron-like is largely a myth. This is not the only
reason why the claim that connection systems are consistent with
real time constraints upon processing is dubious, however.
The entire 100 step argument turns upon the premise that the individual
neuron is the computationally significant level, as far as speed
constraints go, in the brain. Should it turn out that there is
significant processing which occurs at the sub-neuronal level
(for example, at the level of synaptic clefts, see Kolb and Whishaw
1990: pp. 46-47), then this argument would loose much of its plausibility.
The additional processing steps which standard architectures seem
to require may be being done at this sub-neuronal level. In addition,
in the brain, there are many chemical processes of the dendrites
which take place over a wide range of time scales (Fodor and Pylyshyn,
1988: p. 55, n31). This being the case, there are grounds for
wondering why advocates of the 100 step argument choose the rate
of neuronal firing as the relevant time scale for their argument.
There is no defense of this choice in the connectionist literature.
As a matter of fact, what is known about the behavior of biological
systems tends to make the appeal to the speed of neurons implausible.
In particular, the claim that "neurons operate in the time
scale of milliseconds..." (Rumelhart 1989: p.135), which
is crucial to the 100 step argument, involves a considerable oversimplification
of the neurological facts. For example, cortical neurons have
variety of different intrinsic firing patterns and rates (see
Churchland and Sejnowski 1994: p. 53). It is also the case that
the rate of firing of a particular neuron will be determined,
in part, by the kind of nerve fiber which makes connections with
it. deGroot and Chusid (1988: pp. 23-24) describe three distinct
types of nerve fiber which have differential rates of signal conductance.
Given these complexities, The simple temporal claim which is central
to the 100 step argument, lacks plausibility without being defended
in detail. Once again though, such a defense has not been attempted
within the connectionist literature.
Even if these difficulties with the 100 step argument are overlooked,
the argument still fails to unambiguously establish the conclusion
its connectionist proponents propose. As Sterelny (1990: p. 172)
notes, there are two possible conclusions from the 100 step argument.
The weaker conclusion is that, however human brains actually work,
they do not run the same programs as computers do. Of course,
this conclusion is almost certainly (though somewhat trivially)
correct. The stronger conclusion is that the 100 step argument
shows that a certain class of theories about cognition are fundamentally
incorrect. The stronger conclusion is presumably the one which
connectionists wish to endorse. The stronger conclusion is highly
problematic, however.
The strong conclusion of the 100 step argument should persuade
us that explanations of cognitive phenomena which are rooted in
serial processing are defective. However, this alone is not sufficient
to justify the adoption of a connectionist approach. Connectionism
does not have the corner on the market, when it comes to building
parallel processing models. Sterelny (1990: p. 172) mentions (although
he does not give a reference) that "...some version of the
'Marcus parser', which models sentence comprehension by incorporating
a Chomskian transformational grammar, use parallel processes."
Similar points are made elsewhere, in both Pylyshyn (1984) and
Fodor and Pylyshyn (1988).
Another difficulty with the strong conclusion is that it is far
from clear at what level the excessive (i.e. those in excess of
100) number of steps is supposed to arise. Would a computer program
which involved more than 100 function calls be deemed unacceptable?
Are the '100 steps' supposed to be basic processor operations?
Without a clearer notion of what is to count as a step, it is
hard to tell how the 100 step constraint could even be met by
a serial processing system!
Given the problems just raised, it is reasonable to conclude that
the 100 step argument should not be taken as providing support
for the claim that connectionist systems are consistent with real
time constraints upon processing. This, at least as a general
claim about connectionist systems, is just another connectionist
myth.
Myth 3: 'Connectionist Systems Exhibit Graceful Degradation'
The connectionist myths discussed above have focused primarily
upon the supposed similarities between connectionist systems and
biological entities. There is however another species of myths
which concentrates upon attempting to show that connectionist
systems are in some way preferable to non-connectionist ones.
These two types of myth are not totally distinct though. The claim
about real time constraints discussed above for example, involves
elements of both kinds of myth.
A cognitive system which has to interact with the real world is
often faced with imperfect input data. For example, humans by
and large are pretty good at reading one another's handwriting,
even though handwriting usually looks very different from the
block print upon which most people initially learn to read. Similarly,
we are also pretty good at understanding what is being said to
us by someone, even if the speaker has a heavy accent, or the
context of utterance is such that part of the utterance is obscured
by background noise. We still succeed in identifying everyday
objects even when they are viewed under unusual lighting conditions,
or when they are viewed from unfamiliar angles. This being the
case, a desirable property of computational models of cognitive
processes is that such models should also be able to deal with
degenerate input. Ideally, when a system is faced with incomplete,
corrupt or even inconsistent input, the system should be able
to make intelligent guesses about what the input should have been
and make appropriate responses accordingly. If one briefly glimpses
out of the corner of ones eye a bear charging towards one, waiting
for more information is not an especially helpful response! The
ability to handle incomplete, inconsistent or otherwise imperfect
input data is sometimes called 'graceful degradation' (See Clark
1991: p. 62, Sterelny 1991: p. 173).5
One advantage often claimed for connectionist systems over their
traditional counterparts is that connectionist systems exhibit
graceful degradation. Churchland (1988: p. 120) makes the point
thus,
Churchland (1988: p. 120) goes on to note that "even strongly
simplified recognitional problems" are very difficult indeed
for non-connectionist systems. Similar claims can be found in
McClelland, Rumelhart and Hinton (1986).
Now the mythological component here derives not so much from the
facts, so much as the conclusion which is drawn from these facts.
From the fact that many non-connectionist systems, at the present
time, do not exhibit graceful degradation, it does not follow
that they cannot be made to exhibit this property. The facts amount
to nothing more persuasive than prima facia evidence. The
facts certainly cannot be used to support the more general conclusion
that connectionist systems are preferable to non-connectionist
ones. There are a (at least) two reasons why this is the case.
First, non-connectionist systems which exhibit graceful degradation
to the same degree and under the same circumstances that connectionist
systems apparently do, may be developed at any time.6 Indeed, this
is an area of active research. For example, systems known as 'Truth
Maintenance Systems' (See Forbus and de Kleer 1992) have been
developed which are able to reason effectively on the basis of
incomplete information. Second, just because one type of system
is apparently superior to another type with respect to one set
of properties, does not mean that such a system is superior with
respect to all relevant properties. It may well be the
case that there are difficulties with connectionist systems which
other systems can easily over come (see for example the claims
made by Fodor and Pylyshyn, 1988, with respect to systematicity
and compositionality).
Connectionists who argue in favor of their approach on the basis
of the graceful degradation of their systems over look these considerations.
The upshot of this is that the putative superiority of connectionist
systems over other systems is not established by the simple appeal
to one or two apparent properties of these systems. In order for
such arguments to be persuasive, it would be necessary to consider
all the relevant properties. Consequently, although the
factual claims about the graceful degradation of connectionist
systems may, at the current time, suggest that such systems may
have advantages over non-connectionist systems for certain types
of tasks, graceful degradation alone is not sufficient to support
the more general conclusion that connectionist systems are superior
to these systems. It follows from this that claimed superiority
of connectionist systems, based solely upon an appeal to graceful
degradation, is nothing more than a myth.
There are a variety of other allegedly desirable properties which
connectionist systems are claimed to have, which alternative systems
do not. These include being resistant to damage, being good pattern
recognizers, being good at retrieving information on the basis
of the content of the information and being able to handle multiple
constraint satisfaction problems (See McClelland, Rumelhart and
Hinton 1986, for example). All these claims however, fail to adequately
support the conclusion that connectionist systems are intrinsically
superior to non-connectionist ones, for reasons very similar to
those described above for graceful degradation. For this reason,
I will not go through the arguments here. The important point
here is that general claims about the superiority of connectionist
systems over alternative architectures, which are made on the
basis of connectionist systems apparently having some desirable
property which other systems apparently lack, are (generally speaking)
not adequately supported. This being the case, such claims may
constitute nothing more than connectionist myths.
Myth 4: 'Connectionist Systems Are Good Generalizers'
The claim that connectionist networks exhibit graceful degradation
is sometimes made in conjunction with a claim that networks are
good at 'generalization' (See for example, McClelland, Rumelhart
and Hinton 1986: pp. 29-30). Once again this is a claim which
has become included in the philosophical literature on connectionism
(Bechtel and Abrahamsen, 1991; p. 31 passim, Churchland,
1988, p. 161) with little or no qualification. This claim too
is also problematic.
As is the case with the graceful degradation claim, a commonly
implied conclusion from the generalization claim is that connectionist
systems are to be preferred to competing systems as models of
cognitive function. This claim though is a little more interesting
than the graceful degradation claim (and related claims), as such
it deserves special treatment (this is not to say that the objections
outlined above may also apply to this claim). However, like the
connectionist claims above, the generalization claim has a mythological
component.
As a rough first approximation, a system can be said to generalize
when it can produce outputs which are appropriate for a particular
input or class of inputs, which it has not been previously given
information about. The first difficulty with the claim about the
generalization of connectionist systems is that it is often the
case that generalization is specified in a manner which is only
appropriate for connectionist systems (or some sub-set of connectionist
systems). For example, Clark (1993: p. 21) describes generalization
thus,
This notion of generalization is inordinately narrow though. For
example, it would not be applicable to systems which do not undergo
training. The famous Jets and Sharks network (described in McClelland,
Rumelhart and Hinton 1987: pp. 26-31, McClelland and Rumelhart
1988: pp. 38-46, Clark 1991: pp. 86-92, and Bechtel and Abrahamsen
1991: pp. 21-34) is said to exhibit 'generalization' (albeit,
not very good generalization, in this case), yet does not undergo
training.7 If generalization is specified broadly though (for example,
as I do with the 'rough, first approximation' above), then many
manifestly non-connectionist systems seem to exhibit generalization
too. For example, Rip's (1983) ANDS system, which is a paradigm
example of a non-connectionist system, might plausibly be said
to generalize in this sense. This being the case, an appeal to
generalization cannot adequately support the contention that connectionist
systems are preferable to competing architectural approaches.
It is also the case that, even if a narrow conception of generalization
such as Clark's is employed, only some connectionist networks
exhibit this property. A common difficulty encountered by network
researchers is that, even with identical network architectures,
training regimes and similar starting parameters, different versions
of the same network will exhibit different degrees of generalization,
due to the practice of setting initial weight and biases values
randomly. Sometimes, if a network has too many hidden units and
is trained to too strict a convergence criterion, a network may
simply instantiate a 'look-up table' for the training set, and
produce responses upon generalization testing which are equal
to, or worse than mere chance! Generalization (no matter which
conception is employed) is a property of only some networks, and
not a general property of all networks.
An additional complication which arises with the claim about generalization
is that it is very sensitive to the particular task being considered.
This fact is frequently not made explicit in the descriptions
of generalization in the connectionist literature though. This
is very nicely exemplified by another example from Clark. Clark
(1993: p. 21) briefly describes a connectionist network, originally
due to McClelland and Rumelhart (1986, Vol. 2: pp. 170-215), which
was trained to recognize dogs and was trained upon sets of dog
features which were supposed to correspond to the features of
individual dogs. Clark (1993: p. 21) cites as an example of generalization
(in accordance with the conception quoted above) the fact that,
Although the facts are correct, this example does not do justice
to the influence of the chosen task domain upon generalization.
Suppose that the network was trained not only to recognize dogs,
but was also trained to recognize common items of furniture. Now,
if the only three legged object in the entire training set were
to be a small stool, it is quite possible that the network would
classify the three legged dog as a non-barking object (i.e. like
a stool), rather than as a barking one. The performance of such
a network would be dependent upon the ratio of dogs to furniture
in the entire training set, as well as various other specific
details of the training regime. 9
Given all the difficulties I have outlined above, it is not unreasonable
to conclude that the claim that connectionist systems are good
generalizers is so problematic, that in many instances, it may
amount to nothing but a myth. It is far from clear that there
is even a uniform notion of generalization which is used amongst
connectionists, let alone a conception which is common and applicable
to both connectionism and non-connectionist systems.10 Without detailed
clarification of the notion, it is not a suitable basis for comparison
at all. Moreover, the actual evidence for the claim that networks
are good generalizers is far from unequivocal. Thus, claims about
generalization cannot provide an adequate basis to adjudicate
between connectionist and other systems, or be used as a premise
in arguments for other conclusions. Furthermore, although some
connectionist systems may exhibit something which might plausibly
be termed 'generalization', it is certainly not universally the
case. Thus, claims about connectionist systems and generalization
which are not very carefully hedged (as they almost never are)
are going to be false of many connectionist systems. Hence, the
unqualified claim that connectionist systems are good generalizers
is largely a myth.
Conclusion
It should be apparent from the discussion above that a number
of the claims which have been made about connectionist systems
in the philosophical literature, and elsewhere, are not as straightforward
as they may initially appear. Without appropriate qualification,
the claims which have been termed 'myths' here have a great potential
to mislead. Whilst some of the mythical claims have a basis in
fact, the uncritical deployment of these claims as premises to
arguments is highly likely to arguments which are valid, but not
sound. For this reason, it is important that members of the philosophical
community who think about the issues raised by connectionism,
do so in a manner which is more careful than has been the common
practice previously.
There is also a moral here for connectionist researchers. It may
be the case that some of the claims which have here been labeled
'myths' can be substantiated. However, substantiation requires
persuasive argumentation. So, connectionist might think of eschewing
the lab for a little while and spending some time constructing
arguments to support those claims which they consider to be axiomatic
to their research program. This would then assist philosophers
in determining which of their claims, and under what circumstances,
can be used as premises in theorizing about connectionist related
matters.
It might be though that the arguments offered here point to a
skeptical conclusion about connectionist research. However, such
an impression would be misleading. There are, I believe, philosophically
interesting and important conclusions which can be drawn on the
basis of connectionist research (see for example, Dawson, Medler
and Berkeley 1997). However, for those conclusions to become apparent,
it is necessary to clear away the connectionist mythology, in
order to get to the connectionist facts.
1 A notable exception to this general rule can be found in Dawson and Schopflocher (1992: pp. 25-26).
2 One further, though often overlooked, consequence of massive parallelism in connectionist systems is that modern connectionist networks usually violate Minsky and Papert's (1969) limited order constraint. This means that the criticisms of Minsky and Papert's conclusions, such as that offered by Bechtel and Abrahamsen (1991: p. 15), are not only misplaced, they are simply incorrect.
3 Not too much weight should be put on this system though - it is very far from being biologically plausible. See Dawson and Berkeley (1993).
4 For a further discussion of the claim that connectionist networks have some kind of biological plausibility, see Quinlan (1991: pp. 240-244). Quinlan's assessment of the current state of the art is similar to the one offered here.
5 Actually, the notion of 'graceful degradation' is somewhat more technical than this. Clark's (1991) characterization will suffice for current purposes though.
6 This objection is also raised by Sterelny (1990: pp. 173-175).
7 In fact, Clark (1991: p. 92) even makes a claim about the generalization abilities of the Jets and Sharks network.
8 Actually, the situation is somewhat more complex than this, in so much as it is unclear whether or not the inference rules within ANDS are to count as containing information about every inference of a particular syntactic type. This complication does not effect my main point though, so I will not discuss it further here.
9 Cf. the example of a network for assessing bank loan applications, discussed by Clark (1993: p. 71).
10 Indeed, the considerations discussed above might be taken as being indicative that the term 'generalization' exhibits what Waismann (1951) calls 'open texture'.
Barnden, J. and Pollack, J. (Eds.), (1991), Advances in Connectionist
and Neural Computation Theory (Vol. 1): High-Level Connectionist
Models, Ablex Pub. Co. (Northwood, NJ).
Bechtel, W. (1985), "Contemporary Connectionism: Are The
New Parallel Distributed Processing Models Cognitive Or Associationist?"
in Behaviourism, 13/1, pp. 53-61.
Bechtel, W. and Abrahamsen, A. (1991) Connectionism and the
Mind, Basil Blackwell (Cambridge, Mass.).
Churchland, P. M. (1988), Matter and Consciousness: A Contemporary
Introduction to the Philosophy of Mind, MIT Press (Cambridge,
Mass.).
Churchland, P. M. (1989), The Neurocomputational Perspective:
The Nature of Mind and the Structure of Science, MIT Press (Cambridge,
Mass.).
Churchland, P. S. and Sejnowski, T. (1994), The Computational
Brain, MIT Press (Cambridge, Mass.).
Clark, A. (1989), Microcognition: Philosophy, Cognitive Science
and Parallel Distributed Processing, MIT Press (Cambridge,
Mass.).
Clark, A. (1993), Associative Engines: Connectionism, Concepts,
and Representational Change, MIT Press (Cambridge, Mass.).
Crick, F. and Asanuna, C. (1986), "Certain Aspects of the
Anatomy and Physiology of the Cerebral Cortex", in Rumelhart,
McClelland et al. (1986: Vol. 2., pp. 333-371).
Cummins, R. (1989), Meaning and Mental Representation,
MIT Press (Cambridge, Mass.).
Dawson, M. and Berkeley, I. (1993), "Making a Middling Mousetrap",
commentary on Shastri, L. and Ajjanagadde, V. (1993) ""From
Simple Associations to Systematic Reasoning", in Behavior
and Brain Sciences, Vol. 16, No. 3, pp. 454-455.
Dawson, M., Medler, D. and Berkeley, I. (1997) "PDP Networks
Can Provide Models That Are Not Mere Implementations of Classical
Theories" in Philosophical Psychology, forthcoming.
Dawson, M. and Schopflocher, D. (1992), "Modifying the Generalized
Delta Rule to Train Networks of Non-monotonic Processors for Pattern
Classification", in Connection Science, 4/1, pp. 19-31.
deGroot, J. and Chusid, J. (1988), Correlative Neuroanatomy,
(12th Ed), Appleton and Lange (Connecticut).
Dennett, D. (1991), Consciousness Explained, Little, Brown
and Co. (Boston).
Dreyfus, H. (1992), What Computers Still Can't Do: A
Critique of Artificial Reason, MIT Press (Cambridge, MA).
Feldman, J. and Ballard, D. (1982), "Connectionist Models
and Their Properties", in Cognitive Science, 6, pp.
205-254.
Fodor, J. and Pylyshyn, Z. (1988), "Connectionism and Cognitive
Architecture: A Critical Analysis", in Cognition 28,
pp. 3-71.
Forbus, K. and de Kleer, J. (1992), Building Problem Solvers,
MIT Press (Cambridge, Mass.).
Getting, P. (1989), "Emerging Principles Governing the Operation
of Neural Networks", in Annual Review of Neuroscience,
12, pp. 184-204.
Kolb, B. and Whishaw, I. (1990), Fundamentals of Human Neuropsychology
(3rd Ed.), Freeman and Co. (New York).
Lloyd, D. (1989), Simple Minds, MIT Press (Cambridge, Mass.).
McClelland, J. and Rumelhart, D. (1986), "A Distributed Model
of Human Learning and Memory", in Rumelhart, McClelland,
(1986: 2/ pp. 170-215).
McClelland, J., Rumelhart, D. and Hinton, G. (1986), "The
Appeal of Parallel Distributed Processing", in Rumelhart,
McClelland, et al. (1986: Vol. 1, pp. 3-44).
O Nualláin, S. (1995), The Search for Mind: A New Foundation
for Cognitive Science, Ablex Pub. Co. (Norwood, NJ).
Posner, M. (1989), Foundations of Cognitive Science, MIT
Press (Cambridge, Mass.).
Pylyshyn, Z. (1984), Computation and Cognition, MIT Press (Cambridge,
Mass.).
Quinlan, P. (1991), Connectionism and Psychology: A Psychological
Perspective on New Connectionist Research, U. of Chicago Press
(Chicago, IL).
Rips, L. J. (1983), "Cognitive Processes in Propositional
Reasoning" in Psychological Review, 90/1, pp. 38-71.
Rumelhart, D. (1989), "The Architecture of Mind: A Connectionist
Approach", in Posner (1989: pp. 133-159).
Rumelhart, D., McClelland, J. and The PDP Research Group (1986),
Parallel Distributed Processing: Explorations in the Microstructure
of Cognition, (2 Vols.), MIT Press (Cambridge, Mass.).
Searle, J. (1992), The Rediscovery of The Mind, MIT Press
(Cambridge, Mass.).
Shastri, L. (1991), "The Relevance of Connectionism to AI:
A Representation and Reasoning Perspective", in Barnden and
Pollack (1991: pp. 259-283).
Shastri, L. and Ajjanagadde, V. (1993) ""From Simple
Associations to Systematic Reasoning", in Behavioral and
Brain Sciences, Vol. 16, No. 3 (Sept. 1993).
Sterelny, K. (1990), The Representational Theory of Mind,
Blackwell (Oxford).
Winlow, W. (1990a), "Prologue; The 'typical' neurone",
in Winlow (1990: pp. 1-4).
Winlow, W. (Ed.), (1990), Neuronal Communications, Manchester
U. P. (Manchester).