According to the standard (recent) history of connectionism (see
for example the accounts offered by Hecht-Nielsen (1990: pp. 14-19)
and Dreyfus and Dreyfus (1988), or Papert's (1988: pp. 3-4) somewhat
whimsical description), in the early days of Classical Computational
Theory of Mind (CCTM) based AI research, there was also another
allegedly distinct approach, one based upon network models. The
work on network models seems to fall broadly within the scope
of the term 'connectionist' (see Aizawa 1992), although the term
had yet to be coined at the time. These two approaches were "two
daughter sciences" according to Papert (1988: p. 3). The
fundamental difference between these two 'daughters', lay (according
to Dreyfus and Dreyfus (1988: p. 16)) in what they took to be
the paradigm of intelligence. Whereas the early connectionists
took learning to be fundamental, the traditional school concentrated
upon problem solving.
Although research on network models initially flourished along
side research inspired by the CCTM, network research fell into
a rapid decline in the late 1960's. Minsky (aided and abetted
by Papert) is often credited with having personally precipitated
the demise of research in network models, which marked the end
of the first phase of connectionist research. Hecht-Nielson (1990:
pp. 16-17) describes the situation (as it is presented in standard
versions of the early history of connectionism) thus,
In Perceptrons, Minsky and Papert (1969) argued that there
were a number of fundamental problems with the network research
program. For example they argued that there were certain tasks,
such as the calculation of topological function of connectedness
and the calculation of parity, which Rosenblatt's perceptrons2
could not solve. The inability to calculate parity proved to be
particularly significant, as this showed that a perceptron could
not learn to evaluate the logical function of exclusive-or (XOR).
The results of Minsky and Papert's (1969: p. 231-232) analysis
lead them to the conclusion that, despite the fact that perceptrons
were "interesting" to study, ultimately perceptrons
and their possible extensions were a "sterile" direction
of research.
The publication of Perceptrons was not the only factor
in the decline of network research in the late Sixties and early
Seventies, though. A number of apparently significant research
successes from the non-network approach, also proved to be influential.
Systems such as Bobrow's (1969) STUDENT, Evan's (1969) Analogy
program and Quillian's (1969) semantic memory program called the
Teachable Language Comprehender, were demonstrated. These systems,
which had properties like those associated with the CCTM, did
not appear to suffer from the limitations that afflicted network
models.3 Indeed, these systems seemed to show considerable promise
with respect to emulating aspects of human cognition. Bobrow`s
STUDENT program, for example, was designed to solve algebra word
problems. In doing this, the program would accept input in (a
restricted sub-set of) English. This property of the system lead
Minsky (1966: p. 257) to claim that "STUDENT...understands
English". Although this is now seen to be highly misleading
(see, for example Dreyfus' 1993: pp. 130-145 critiques of all
the systems mentioned above), at the time it was a fairly impressive
claim which did broadly seem to be supported by Bobrow's program.
Network research, by comparison, had nothing as impressive to
offer. Given Minsky and Papert's unfavorable conclusions and the
apparent fruitfulness of non-network based approaches, it is not
surprising that research into network systems went into decline.
During the 1970s, there was very little work done on connectionist
style systems. Almost all the research done in AI concentrated
upon the other approach. This is not to say that there was no
network research done during this period. A few individuals, most
notably Anderson (1972), Kohonen (1972) and Grossberg (1976),
did continue to investigate connectionist systems, however network
researchers were very much the exception rather than the rule.
After a ten year hiatus though, connectionism reappeared on the
scene as a significant force. One reason for this resurrection
was that a number of technical developments were made which seemed
to indicate that Minsky and Papert had been premature to write
off such systems .
Minsky and Papert only considered Rosenblatt's perceptrons in
their book of the same name. One of the significant limitations
to the network technology of the time was that learning rules
had only been developed for networks which consisted of two layers
of processing units (i.e. input and output layers), with one set
of connections between the two layers. However, Minsky and Papert
(1969: p. 232) had conjectured (based on what they termed an "intuitive"
judgment) that extensions of the perceptron architecture, for
example based upon additional layers of units and connections,
would be subject to limitations similar to those suffered by one-layer
perceptrons. By the early 1980s more powerful learning rules had
been developed which enabled multiple-layered networks to be trained.
The results that such multiple-layered networks yielded indicated
that Minsky and Papert's 'intuitive judgment' was too hasty (see
Rumelhart and McClelland 1987: pp. 110-113).4
Another important factor in the renaissance of network models,
according to the standard view, was a growing dissatisfaction
with the traditional approach. Arguably the most important event
in this renaissance was the publication of the two volume work
Parallel Distributed Processing by Rumelhart, McClelland
et al. (1987).5 Dreyfus and Dreyfus (1988: pp. 34-35) describe
the situation thus,
Smolensky (1988) describes how "...recent meetings [i.e.
those circa 1988] of the Cognitive Science Society have begun
to look like connectionist pep rallies.". Hecht-Nielsen explicitly
(1990: p. 19) describes those who came 'flocking' to the new connectionism
as 'converts'. The religious analogy is not insignificant here.
Just as it is often the case that religious converts seek to vilify
other belief systems, so the converts to connectionism often attempted
to emphasize what they believed to be the fundamental differences
between the connectionist and the CCTM based approach. Of course,
such an environment is highly conducive to the development of
myths.
So, the history of connectionism as commonly characterized, is
a history which, apart from the early years, has been marked by
a struggle with the approach which had roots in the assumptions
underlying the CCTM. Many recent descriptions of the relationship
between the approaches dwell almost exclusively upon the putative
differences between them. For example, Schneider (1987), Churchland
(1989), Smolensky (1991), Sterelny (1990), Cummins (1991), Tienson
(1991), Bechtel and Abrahamsen (1991), Fodor and Pylyshyn (1988)
and Hecht-Nielsen (1991) all portray the two approaches as being
in direct competition with one another. Given the standardly told
story of the history of connectionism, such an antagonistic relationship
between the two approaches is far from surprising. The standard
version of this history also suggests that certain episodes (such
as the publication and circulation of Perceptrons) were
marked by a certain guile and personal crusading on the part of
the anti-connectionist camp. Connectionism is usually portrayed
as a field of research which was unfairly retarded early on, but
which, due to the publication of The PDP Volumes and the
empirical inadequacies of the alternative, has only comparatively
recently begun to bloom. This kind of perspective fits well with
the view that connectionism provides the basis of some kind of
substantial alternative to the assumptions underlying the CCTM.
Unfortunately, this version of history is highly selective, partial
and in certain respects, down right misleading.
As a matter of historical fact, in the early days of AI research,
a number of high profile researchers in the field worked with
both approaches. Even Papert (1988: p. 10) for example,
did work on network models. Another example is von Neumann, who
worked with McCulloch-Pitts nets and showed that such nets could
be made reliable and (moderately) resistant to damage by introducing
redundancy (i.e. having several units do the job of one). In fact,
von Neumann published quite extensively on the topic of networks
(see von Neumann 1951, 1956 and 1966), although his name is most
often associated with classical systems.
There were a number of significant results which came to light
in the 1940's and 1950's, with respect to network models. Arguably
the most important of these was McCulloch and Pitt's (1943) demonstration
that networks of simple interconnected binary units (which they
called 'formal neurons'), when supplemented by indefinitely large
memory stores, were computationally equivalent to a Universal
Turing Machine. Later, Rosenblatt (1958) developed an improved
version of the units employed by McCulloch and Pitts. Both McCulloch
and Pitt's formal neurons and Rosenblatt's units had threshold
activation functions (by contrast, most modern connectionist units
have continuous activations).6 The innovation which Rosenblatt
made was to develop modifiable continuously valued connections
(i.e. weights) between the units. This enabled networks of these
units to be effectively trained. In particular, Rosenblatt's training
procedure was supervised and such that the system learned only
when it made a 'mistake' with respect to the desired output for
a particular input pattern. Rosenblatt called networks of his
units 'Perceptrons'.
The significance of Rosenblatt's innovation became clear when
he (1962) demonstrated the Perceptron Convergence Theorem. This
theorem holds that if there is a set of weighted connections of
a perceptron, such that the perceptron gives the desired responses
for a set of stimulus patterns, then after a finite number of
presentations of the stimulus-response pairs and applications
of the training procedure, the perceptron will converge upon that
set of weights which would enable it to respond correctly to each
stimulus in the set.7
Marvin Minsky, so often portrayed as a villain in the standard
version of the history of connectionism, has also made significant
contributions to network research. In 1951 Minsky, in conjunction
with Dean Edmonds, constructed a machine known as the SNARC (Rumelhart
and Zipster (1987: pp. 152-154)). The SNARC was the first 'learning'
machine and was constructed along what would now be thought of
as connectionist principles, according to Hecht-Nielson (1990:
p. 15). Indeed, his work with the SNARC formed the basis of Minsky's
Ph.D. dissertation. Minsky (1954) even included the phrase 'neural
nets' in the title of his dissertation. According to Minsky (personal
communication, 1994) it wasn't until "...around 1955, largely
at the suggestion of my friend Ray Solomonoff....[that] I moved
toward the direction of heuristic serial problem solving.".
That is to say, Minsky's interest in network based system in fact
predates his interest in CCTM based systems.
It is also the case that in the early phase of connectionist research,
there was relatively little antagonism between the two approaches.
The difference was rather one of attitude. Minsky (personal communication,
1994) characterizes the situation as follows,
These facts are perhaps somewhat surprising, given the malevolent
role ascribed to Minsky in the standard histories of connectionism.
Perhaps, it might be conjectured, the adversarial relationship
between the approaches derives from Minsky and Papert's critique
of networks in Perceptrons. If this is the case for some
though, this adversarial perspective does not seem to be shared
by Minsky himself. Even long after the publication of Perceptrons,
Minsky continued to do theoretical work upon network models. In
1972 for example, Minsky (1972: p. 55) published a proof that
showed that "Every finite state machine is equivalent to,
and can be 'simulated' by, some neural net". Indeed, Minsky
does not endorse the adversarial view of the relation between
the approaches even today. Consider the following remark by Minsky
(1990),
These facts serve to show that the supposed distinction between
the two approaches, at least in the early days of network research,
were not as sharp as some commentators would have us believe (C.f.
Dreyfus and Dreyfus (1988)). Furthermore, there seem to be grounds
for wondering just who is responsible for the putative conflict
between the approaches. Although he is frequently 'demonised'
in the connectionist literature, it does not seem to be Minsky!
The responsibility for the antagonistic relation between the approaches,
and the consequently partial standard history, does not straightforwardly
lie with any one individual or group. It is rather the consequence
of a number of factors. It is certainly the case that the authors
of the PDP Volumes must take some of the responsibility.
For example, McClelland, Rumelhart and Hinton (1987: p. 11) remark
that
Such remarks are fairly clearly antagonistic to advocates of the
more traditional approach. There are many other similar examples
which can be found in the PDP Volumes.
It is also the case that the authors of the PDP Volumes
make a number of claims about the relationship between their systems
and the ones discussed by Minsky and Papert in Perceptrons
which are not entirely accurate. Examples of misleading claims
can be found in Rumelhart, Hinton and McClelland (1986: p. 65),
Rumelhart and McClelland (1986: p. 113) and Rumelhart, Hinton
and Williams (1986: p. 361), for example. Minsky and Papert's
responses to these specific claims are in the epilogue of the
third edition of Perceptrons (1988). Of course, the authors
of the PDP Volumes were not alone in misunderstanding Minsky
and Papert's work. Minsky (personal communication, 1994) describes
the situation thus,
It is by no means the case though that the responsibility for
the adversarial relationship between connectionism and approaches
which share assumptions with the CCTM belongs just to the authors
of the PDP Volumes. In fact, Rumelhart (personal communication,
1994) still considers his work as part of the more general enterprise
of AI. He also believes that the 'AI is dead' talk which arose
just after the publication of the PDP Volumes, was mistaken.
Undoubtedly, the emergence of 'new' connectionism was accompanied
by a certain amount of jumping on the proverbial connectionist
bandwagon. It is almost certainly the case that a number of the
new 'converts' to connectionism made claims which were far too
strong and thereby engendered the wrath of some of the advocates
of the other approach. This too is likely to have encouraged an
antagonistic relation between the two approaches. It is also certainly
the case that some of the antagonism between the approaches can
be traced backed to Fodor and Pylyshyn's (1988) paper.
Although it would be possible to pursue this theme in much greater
detail, I hope that the above is sufficient to make it clear that
this putative antagonism between CCTM and connectionist approaches
to studying the mind is, for the most part, a comparatively recent
phenomenon. It is interesting and (I believe) significant to note
that some of the major figures in the fields (e.g. Rumelhart and
Minsky) do not subscribe to this view of the relationship.
2) Perceptron based systems were, arguably, the flag-ship variety of network systems at the time.
3) It is worth noting that all the systems mentioned here were developed by Minsky's own graduate students, according to Dreyfus (1993: p. 149). For a more detailed overview of each of these programs and the way they were evaluated, see Dreyfus (1993: pp. 130-145).
4) For a more detailed account of the work which underwrote the rebirth of connectionism, as well as a more detailed account of network research during the 1970s and early 1980s, see McClelland, Rumelhart and Hinton (1987: pp. 41-44).
5) It is now standard practice to refer to this work by the title The PDP Volumes.
6) See the discussion of activation functions in Dr. Ish's Introduction to Connectionism, for further details.
7) For a more detailed account of the history of network
models, see Cowan and Sharp (1988).