Overview: This chapter is arranged in four major sections. The first presents the background to Instrumental Conditioning, covering the early work by Watson and Thorndike, the basic findings, what they thought was going on, and what some of the standard paradigms for Instrumental Conditioning are. The second presents many of the basic principles that determine when conditioning will occur, and whether it will be excitatory or inhibitory. The third discusses important exceptions to these principles, and examines the complex interactions that can arise. Finally, the fourth section briefly examines several alternative accounts of the type of an association that can form. Several additional accounts of learning are introduced here; most notably, Tolman's Cognitive Expectancy Approach and Skinner's Radical Behaviorism. This section closes by examining some of the interrelationships between Instrumental and Classical Conditioning.
Let's start with some
historical
background.
First, however, let us distinguish instrumental conditioning from classical conditioning. In instrumental conditioning, an animal makes one of a number of possible responses in the presence of some stimulus complex or context. That response may lead to some outcome. We typically define learning in this circumstance as an alteration in some observed characteristic of the response such as its frequency, latency, or amplitude. We will revisit this definition in more detail later, once we have examined several theories of what gets acquired, and why. For now, we can talk about instrumental conditioning as the type of learning involved in navigating a maze, choosing the correct one of several doors to run to, or even performing some response that will be successful in avoiding a future shock. In instrumental conditioning, new responses may be taught that differ from any reflexive response already in the animal's behavioral repertoire.
Give me a dozen healthy infants, well-formed, and my own specified world to bring them up in and I'll guarantee to take any one at random and train him to become any type of specialist I might select -- doctor, lawyer, artist, merchant-chief and, yes, even beggar-man and thief, regardless of his talents, penchants, tendencies, abilities, vocations, and race of his ancestors.This was radical in at least two related ways. First, from a scientific perspective, it clearly denied the relevance of genetic or inherited influences on current behavior. And second, from a social perspective, it was about as different a position as one could expect to find from the then-prevailing attitudes about race and class.
In any case, Watson's primary idea was that an association could form between a stimulus and a response (in addition to the type of association found in classical conditioning). But he was a strict contiguity theorist on the issue of S-R associations: A response made in the presence of a stimulus might associate with it, and under certain circumstances, would be likely to be seen when that stimulus recurred. Those circumstances were defined by essentially two principles. The first, a principle of frequency, stated that the association strengthened each time the response was made to the stimulus, so that all things being equal, a frequent response was much more likely to be emitted by the animal than a less frequent response. In addition, however, there was a principle of recency: All things being equal, a recent response was more likely to be emitted than a less recent response.
What you should particularly note about the brief description of Watson's system above is the complete lack of any reference to a reinforcer, a perhaps surprising omission to students who have been introduced to the idea that instrumental/operant conditioning is in large part about the effects of rewards and punishments. That wasn't so for Watson, and it has not always been so for later theorists as widely divergent as Guthrie and Tolman (see below and in the next chapter). But on a preliminary and casual analysis of classical conditioning, the notion of a reward or punishment does not seem greatly relevant in discussing whether the association forms. (Nevertheless, some theorists refer to the UCS as a reinforcer on a broad definition that a reinforcer is what makes a response more likely; presence of the UCS, whether in excitatory-appetitive or excitatory-aversive conditioning, certainly accomplishes that!) Why, then, ought we to include it in instrumental conditioning?
And even though Watson talked about associations between stimuli and responses, he also allowed for the possibility of associations between responses themselves. Thus, in the case of animals learning to run a maze, the analysis of what is going on will involve a complex series of muscle movements involving motor responses. (Much of our behavior is complex, rather than the execution of simple responses to individual stimulus triggers.) Rather than talk about external stimuli controlling each succeeding muscle movement that gets the animal from Point A to Point B in the maze, Watson claimed that a chain of responses could be linked together that would be initially set off by an external stimulus. Of course, to the extent that any response also involves internal stimulation, one could still analyze chains in terms of stimulus-response links, so that each muscle movement in the chain serves as the response to the previous movement, and the stimulus for the next.
We have already discussed in Chapter 1 Watson's insistence that thinking could be reduced to subvocal speech. He also conducted experiments in emotional conditioning. In a famous study with Rayner, Watson conditioned a young child, Little Albert, to be afraid of a white rat. Every time Albert played (apparently happily, at first) with the rat, an experimenter would creep up behind Albert and strike a metal bar, making a loud clanging noise that frightened Albert and caused him to cry. After several such occasions (six, in fact), Albert started to cry at the sight of the rat. Note how this could be analyzed from the point of view of classical conditioning: The noise caused the apparent emotional response of fear, whereas the rat served as the CS.
Given what you know of Watson's views on mentalism, you may be somewhat surprised to discover him talking about the topic of emotions. However, for Watson, emotions were not underlying mentalistic events, but rather, the behavioral components (the crying, the whimpering, the shaking, etc.) observed in reaction to certain stimuli. Thus, Watson maintained a perfect consistency with respect to his position that positivism required dealing strictly with a behavioral level. Perhaps that is why he did not talk about reinforcers. Thorndike, a contemporary of Watson's, was developing a theory of learning based on reinforcers, and although he defined them in a sufficiently behavioristic fashion, he was nevertheless attacked by others for apparently sneaking mentalistic terms back into hard-nosed scientific psychology.
To reiterate a point I made earlier, later behaviorists have on occasion adopted a strict contiguity approach to learning. Most notable among these as a successor to Watson was Guthrie, whose principle of conditioning stated (1952 , p. 23):
A combination of stimuli which has accompanied a movement will on its recurrence tend to be followed by that movement. Note that nothing is here said about...reinforcement or pleasant effects.As we will see later, such approaches were in part a reaction to work by Tolman and his colleagues suggesting that learning could occur in the absence of rewards or punishers. The question that faces such theorists then becomes one of explaining how and why rewards and punishers seem to influence the course of learning.
Thorndike asked a very simple question: Would escape from a puzzle box exhibit any signs of intelligence? Would it display evidence of insight, in which the animal would be able to glance about its environment, understand that the rope was attached to the door, and realize that it needed only to pull on the rope to get out? To answer this question, Thorndike repeatedly placed animals in the same puzzle box, and measured how long it took them to escape. And what he found was that the time to escape decreased only gradually. By the end of the experiment, after 20 or so trials, cats would easily leave the box by performing the appropriate response as soon as they were placed in it. But, their history clearly demonstrated that this had to have been a learned response. In particular, Thorndike pointed out that an animal making the correct response on a given trial early in training would not necessarily choose that same response as its first response on the next trial. So, rather than insight, he concluded that learning involved trial-and-error.
Trial-and-error refers to the gradual accumulation of correct responses through a slow process of trying out all sorts of possibilities, and slowly weeding out the ones that do not work. As did Watson, Thorndike thought animals were acquiring associations between stimulus configurations (such as the puzzle box) and certain responses. But unlike Watson, he claimed that an additional factor was important in the acquisition of these associations: They would depend on the outcome of the animal's actions. This involved a principle Thorndike termed the Law of Effect. Put briefly, this law claimed that an association between a stimulus and a response would strengthen if the response were followed by a satisfactory state of affairs, and would weaken if the response were followed by an unsatisfactory state of affairs. Thus, Thorndike deliberately included Bentham's notion of hedonistic value as a principle governing the formation of an association, in contrast to Watson. Rather than being a simple contiguity theory, this was a reinforcement theory: In modern terms, learning of an association will occur when there is a reinforcer following a response.
There are, of course, a number of interpretations available to account for how a reinforcer might operate according to the law of effect. One of the first to come to most people's minds is a teleological or purposive explanation: The animal performs a response because it desires the outcome. But of course, desiring an outcome is a mental state that involves an object not present at the time the animal is performing the response. That type of an explanation would violate the positivist program Watson insisted everyone follow. Thus, as an alternative, we might propose that a positive outcome has an automatic effect of strengthening the association: The animal does not perform the response because it wants the outcome, but rather because the response is strongly associated to the stimulus that is present.
Here is what Thorndike actually said regarding satisfying and unsatisfying states (1913, p. 2):
By a satisfying state of affairs is meant one which the animal does nothing to avoid, often doing things which maintain or renew it. By an annoying state of affairs is meant one which the animal does nothing to preserve, often doing things which put an end to it.Although he was accused of using hopelessly mentalistic terms in describing learning as depending on satisfactory or unsatisfactory states, his actual definition provided a clear behavioral test for determining when one or the other state was present. In that sense, it ought to have troubled people no more than Watson's use of the term "emotional."
Note too that Thorndike did not include the outcome in the association. As we will see, other theorists have claimed that associations to the outcome may also form, so that we can have S-R associations, R-O associations, and even S-O associations. To anticipate how such a model might differ from Thorndike's, a strong S-R association may exist despite a highly unpleasant or unsatisfying outcome: The presence of an R-O association in that event may serve to inhibit the R excited by presence of an associated stimulus.
Thorndike also proposed another principle, the Law of Exercise (sometimes called the Law of Use). This was essentially a principle of practice, somewhat similar to Watson's notion of frequency: An association would strengthen if practiced. Both laws were revised in his later work: the Law of Effect was essentially restricted to satisfactory outcomes, and the Law of Use was modified to include outcomes rather than simple exercise.
Thorndike also spoke of the value of different satisfactory states, so that strong satisfiers would do a better job of strengthening an association than weak satisfiers. And as an interesting historical footnote, he actually contradicted one of the major principles of strict contiguity by proposing an early version of belongingness by which some things would be more likely to associate together than others.
In some sense, Skinner may be regarded as Thorndike's intellectual successor. Skinner proposed similar ideas involving the law of reinforcement and the law of punishment. According to Skinner, a reinforcer was any event that, following a response, made that response more likely, whereas a punisher was any event that had the opposite effect. To try to identify reinforcers and punishers in a way that wasn't completely circular (and also wasn't mentalistic), Skinner imposed a condition of transituationality: A reinforcer or punisher, once identified in terms of its effects on one response, also has to be shown capable of having a similar effect in other situations, on other responses. Otherwise, we find ourselves defining a response as that which, when followed by a reinforcer, increases in frequency. And that type of definition, of course, reciprocally defines responses and reinforcers in terms of one another in an uninteresting, circular fashion.
With this as background,
let us look at some of the basic findings in instrumental conditioning.
The usual procedure for obtaining generalization involves pairing a response with an outcome in the presence of a specific stimulus, and then presenting other stimuli to see whether there is a similar response to them. As outcomes may be of two sorts (reinforcers and punishers), we may obtain two different types of generalization gradients. The gradient associated with use of a reinforcer is termed the gradient of excitation, whereas the gradient associated with use of a punisher is termed the gradient of inhibition. In an excitatory gradient, we look for responding to novel stimuli that is above the background or baseline or operant level; and in a gradient of inhibition, we look for responding below normal.
Typically, when a response has been reinforced in the presence of the stimulus, that stimulus is referred to as S+. Similarly, when the response has been punished, the stimulus is referred to as S-. Watson and Rayner used an S- with Little Albert. In their work, they also reported obtaining generalization: Albert developed fear reactions to other stimuli (such as rabbits and coats) involving the features white and fur. Although they had planned on reversing the fear conditioning, Albert's mother removed him from the daycare where they were doing their experiments.
A good example of a gradient of excitation may be found in the work of Guttman and Kalish. They took four different groups of pigeons and trained them to peck at a colored key. The key differed in color for the four groups (530, 550, 580, or 600 nanometers). Then, in a generalization test, Guttman and Kalish presented a series of 11 colors, one at a time, and simply counted the number of pecks per 6 minute period that each color received. These colors included the original (the S+), 5 colors above the S+ in wavelength, and 5 colors below. Their results appear in Figure 1.
Several features of these results should be noted. First, the stimulus
that received the most pecks for each group was S+:
That
is where the peak of each generalization gradient may be found.
(As you will see later, this need not always be the case. Certain
experiences
such as discrimination training may alter the peak and shape of a
generalization
gradient.) Second, there was a relatively smooth drop-off of
responding
as the wavelengths of the stimuli increasingly differed from the S+.
And finally, the curves were symmetric: The left-hand side of
each
curve looked approximately like the right-hand side.
Similar features may be found in a gradient of inhibition. Rather than look for a peak, however, we search for a valley representing the lowest level of responding. Here, as the stimuli increasingly differ, we ought to find increasing recovery of responding. Thus, a gradient of inhibition looks a bit like an upside down gradient of excitation. In each, the idea is that similarity of stimuli maps into similarity of responses.
As was true of classical conditioning, there will be occasions in which we want to train an animal to treat apparently similar stimuli as if they were different. As a parent, you might think there to be good reason to train a child to fear rats without desiring that such reactions extend also to rabbits or cats. The standard technique for teaching a discrimination in instrumental/operant conditioning will prove similar to that introduced in classical conditioning: We present the outcome whenever the organism makes the response in the presence of one stimulus, but not in the presence of another. To introduce technical terms, the stimulus that signals an effective response (effective in the sense of producing an outcome) is called the discriminative stimulus or discriminative cue (SD). The stimulus that should come to signal an ineffective response is normally represented with a delta symbol. As I am posting this to the web where delta symbols are a bit tricky to insert into normal text, I will adopt the practice of using S+ and S- in this situation, as well.
There are other techniques to train a discrimination, as you will see in a later chapter. Rather than associate one stimulus with no outcome, we can associate it with the need for a different response. Thus, perhaps the animal will need to turn to the left for food when a red light is present, but will need to turn to the right for food when an orange light appears. Such a technique is referred to as choice discrimination. Alternatively, we might slowly introduce the second stimulus into the animal's environment, presenting it initially at very low levels of intensity. If the intensity is slowly increased, we may find that our animal has never responded to it, thus foregoing generalization. (You should be wondering whether there is something like latent inhibition going on with this procedure!) This technique is referred to as errorless discrimination. Each technique appears to have different effects on the generalization gradient. In particular, the standard technique using S+ and S- seems to cause the peak to move away from S+, and to the side opposite S-, a phenomenon referred to as peak shift. Moreover, peak shift is typically associated with a gradient that is no longer symmetrical: The gradient appears to be 'bunched up' on the S- side.
Finally, we may also note the existence of contrast effects associated with these phenomena. There are several types of contrasts found in instrumental conditioning. One that accompanies peak shift is termed behavioral contrast. Hanson, for example, compared discrimination learning and non-discrimination learning groups. The discrimination-learning group displayed a peak shift. Their responding to the S+ dropped off considerably. But the responding to the stimulus that was the new peak increased dramatically. This group displayed about twice as many responses to this untrained stimulus compared to the control group that did not have discrimination training. Thus, in a behavioral contrast, responding occurs to a novel stimulus at a greater level than would be expected on the basis of simple generalization.
As was true in classical conditioning, extinction in instrumental conditioning is viewed as a type of inhibitory learning. Following extinction, we obtain similar patterns of spontaneous recovery and relearning that we did with classical conditioning: An extinguished response tends to recur after a while (spontaneous recovery), arguing against any claim that the association acquired during the acquisition phase had actually been destroyed or forgotten. Similarly, pairing the extinguished response with the outcome results in much faster acquisition (relearning), another argument suggesting extinction does not destroy the original learning.
Moreover, a stimulus associated with extinction appears to act as an aversive stimulus for the animal, suggesting some degree of inhibition. Daly, for example, found that rats would learn to escape from a place where they had earlier expected a reinforcer. When the reward was no longer available, the contextual cues associated with that location were sufficient to motivate the animal to avoid them by learning some new response getting it out of that situation.
Complicating the picture somewhat is the fact that an outcome may be a punisher. Punishment, of course, often suppresses a response. There has been an argument extending as far back as Thorndike concerning the effectiveness of punishment. Many people have reported that punishers seem to have, at best, temporary suppressive effects on on-going behavior. However, that issue appears to involve the intensity of the punisher. There is now plenty of evidence that highly aversive punishers may have long-lasting effects. According to Bolles, stimuli present when a punisher occurs may become conditioned danger signals that will tend to interfere with on-going behavior by activating the animal's instinctive defenses (SSDRs: species-specific defense reactions). Rats, for example, will run, freeze, or fight. So, in this case, a conditioned suppression-like reaction may occur because one of these responses will be incompatible with other excitatory responses such as pressing a lever for food.
In the case of a punished response, of course, extinction of that response by no longer associating it with an aversive outcome ought to inhibit the stimulus's ability to act as a danger signal triggering an SSDR. Inhibition of aversion in this case means seeing less aversion.
One more point while we are (briefly) on the subject of punishment: One of the difficulties theorists have had with the effects of punishment (and with positing a general principle that punished responses decrease in frequency) may be seen from a study by Brown, Martin, and Morrow. They taught rats to run an alleyway to escape shock. Basically, the alleyway was electrified, so the animals needed to run to the goal box (the only non-electrified, safe portion of the alleyway). When the shock was turned off, there was fairly quick extinction of running.
However, two other groups of rats were also put through an extinction procedure. For one of these groups, the shock was also turned off in the start box, so that they would actually be punishing themselves for venturing out of the start area. The other group had the final 2 feet (of a 10-foot alleyway) electrified, so that they would be punished by trying to get to the goal box. Curiously enough, these two groups did not extinguish anywhere near as rapidly: By the 6th day of extinction, they were still running to the goal box, giving themselves needless shocks. Thus, punishment sometimes can actually prolong the response being punished. This effect is called vicious circle behavior.
Within the framework of a model such as Bolles's theory, a finding like that of Brown et al. may be accounted for in terms of shock continuing to trigger the rat's running SSDR. It is also possible that vicious circle behavior may ensue because of multiple mechanisms. Thus, in another experiment, Badia and Culbertson set up a situation in which shock could be signaled or unsignaled. In signaled shock, a stimulus will come on slightly before the shock. In this study, they allowed rats to learn a response whose only reinforcer involved shocks being signaled. Their animals acquired the response. Moreover, Badia, Culbertson, and Harsh found that given a choice between unsignaled mild shocks of short duration and signaled shocks of longer duration and higher intensity, the animals still performed the response, thus apparently subjecting themselves to more punishment than was necessary. This type of vicious circle behavior seems different from that of Brown et al. Rather than involve danger signals triggering SSDRs, it seems to implicate a tradeoff between severity of the shock and predicting when it ought to occur. On the other hand, Bolles also talks about safety signals that indicate a period free from danger. In the unsignaled condition, there are no safety signals. Thus, this type of vicious circle behavior may well result from an organism's search for safety signals. That ought to remind you a bit of the work on compensatory or antagonistic conditioning, and its adaptive value: Being in a highly aroused and tense physiological state because of the continual presence of danger is physiologically stressful; safety signals help moderate the wear and tear.
Skinner may be credited with first making the distinction between primary and secondary reinforcers. The standard example of secondary reinforcers operating in human societies is the use of money. In our society, money includes round pieces of metal and rectangular pieces of paper that have an extraordinary power to motivate behavior. In other societies, different objects serve a similar function (tooled shells, for instance). These objects are not valuable in themselves (aside from aesthetic considerations of design, etc.), but supposedly take on their value by means of serving as a medium of exchange for intrinsically valuable goods such as food or drink. Presumably, they acquire their reinforcing properties by being associated with primary reinforcers.
Essentially, then, secondary reinforcers are believed to be conditioned through a process of classical conditioning involving the following set-up:
CS (neutral stimulus) & UCS (primary reinforcer such as food)
Once we have established a pairing between the CS and a primary reinforcer, we may then test for its value as a secondary reinforcer. Our experimental design would be as follows:
Group Classical Conditioning Instrumental Acquisition
experimental CS & primary
RF
R in presence of S+ followed by CS
control
(Nothing)
R in presence of S+ followed by CS
If we see an increase in responding in the experimental group compared to the control group, then our CS has acquired reinforcing properties. This example should make clear why this is an instance of mediated learning: The effect of the CS in the experimental group occurs by virtue of its link with the primary reinforcer or UCS. When that link weakens, the value of the CS as a secondary reinforcer ought also to weaken. Thus, in times of inflation when more money is required to buy the same food, the reinforcing properties of a dollar or five dollars weakens.
On a classical conditioning analysis of secondary reinforcement, we would expect to obtain findings like those we've already seen in the previous chapters. Several examples of such findings might be mentioned. One involves a study by Egger and Miller. They trained pigeons using a design similar to this one:
Group Classical Conditioning Instrumental Acquisition
1
CS1 --> CS2 --> primary
RF
R followed by CS1 or CS2
2
CS1 --> CS2 --> primary
RF
R followed by CS1 or CS2
CS1 --> ..... --> no RF
As you can see from this design, Group 2 had discrimination training in the sense that presence or absence of CS2 was relevant to predicting presence or absence of the UCS. Not surprisingly, given its better signal value, CS2 turned out to be the secondary reinforcer for the instrumental acquisition phase in this group. But what about Group 1? CS2 certainly has better contiguity with the UCS. However, in terms of signal value, it is not adding anything to what CS1 already predicts. Thus, it is redundant, and we would predict from models like Kamin's or Mackintosh's that CS2. be blocked. Consistent with this prediction, the secondary reinforcer for instrumental acquisition in Group 1 is CS1, and not CS2.
Another example is rather cute. It comes from the Brelands, former students of Skinner's who tried to train animals to perform in commercials using the principles they had learned. It is also cute because it involves the notion of money as a secondary reinforcer. In one instance, they attempted to train a pig to roll a (fake) coin into a piggy bank. During the training, the coin was paired with a primary reinforcer, since they wanted to use the coin as a secondary reinforcer for the responses involving in rolling. The procedure worked for a brief while, but then the pig started treating the coin as if it were similar to the food it had been paired with: It tried to root the coin just as it would have rooted real food. This result, termed instinctive drift, is perhaps one of the clearest demonstrations of the involvement of classical conditioning in secondary reinforcers, though it also serves to remind us that Watson's claim about reflexes quickly becoming overwhelmed by learned associations radically overstates the case.
Secondary reinforcers play an important role in certain aspects of therapy and classroom behavior. In clinical and educational settings, behavior modification techniques based on principles of conditioning are used to try to change unacceptable behavior. These techniques typically include a component of secondary reinforcement by which objects such as poker chips may be accumulated for making desired responses (or avoiding undesired responses), and later traded for privileges such as snacks, movies, pencils, etc. Use of such secondary reinforcers involves the construction of what is called a token economy.
Other findings relevant to the involvement of classical conditioning in secondary reinforcement include the intensity of the primary reinforcer (more intense primary reinforcers yield more effective secondary reinforcers), the number of times the putative secondary reinforcer is paired with the primary reinforcer, and the delay between these events. You should be able to figure out why models such as Rescorla-Wagner or Wagner's rehearsal model, for example, would support these findings.
One more important phenomenon while we are on the subject of mediated conditioning and secondary reinforcement: Most behavior involves a complex series of responses executed in a certain rapid and relatively smooth order. How is it that each single response can be reinforced? There hardly seems time for that. And how is it that organisms in real environments (rather than the laboratory where a researcher can control reinforcers and stimuli) acquire such complex organizations? The answer to these questions involves the concept of chaining, and will prove to rely heavily on secondary reinforcers.
We briefly introduced the notion of response chains earlier. An example will illustrate this concept. Let's set ourselves the task of teaching pigeons to Time Warp. The Time Warp is the dance from the Rocky Horror Picture Show. It (as is true of all dances) may be regarded as a series of steps in a chain. In the case of the time warp, there are 5 steps (The Rocky Horror Show, 1975):
It's just a jump to the left, and then a step to the right. With your hands on your hips, you bring your knees in tight. But it's the pelvic thrust, They really drive you insane. Let's do the Time Warp again.Normally, we would try to teach a chain backwards, So, we will train the last step first. That involves teaching the pigeon a pelvic thrust. We have our response here, but we need a stimulus and a reinforcer. Let's use a red light for the stimulus (seems appropriate, huh?), and some drink for the reinforcer. Our design then is:
Phase Stimulus (CS) Response Reinforcer (UCS)
1 red light pelvic thrust drink
Note particularly that I have also labeled the stimulus a CS, and the reinforcer a UCS. This is meant to suggest that classical conditioning will be going on simultaneously with instrumental conditioning: The stimulus is paired not only with the response, but also with the outcome. Thus, as a result of instrumental conditioning, the animal should do a pelvic thrust to the red light. But, as a result of classical conditioning, the red light ought to become a secondary reinforcer. And that should suggest to you the rest of the design. Here it is in full:
Phase Stimulus (CS) Response Reinforcer (UCS)
1
red
light
pelvic
thrust
drink
2
blue
light
knees in
tight
red light
3
green
light
wings on
hips
blue light
4
yellow
light
step to
right
green light
5
white
light
jump to
left
yellow light
So, if you look for the moment just at Phase 2, notice that we will reward the pigeon for bringing its knees in tight by following that response with the red light. If the red light is a secondary reinforcer, then the animal will acquire the response. And note too that the red light also serves as the signal for the next step after knees in tight: the pelvic thrust. And finally, note that in Phase 2, we ought to obtain second-order conditioning: Two CSs (the blue and red lights) are being paired. If successful, this means that the blue light now also becomes a secondary reinforcer.
At the end of this, the sequence will be that a white light serves as the signal for a jump to the left; that's reinforced by the yellow light (thanks to fourth-order conditioning) which also signals Step 2 (a step to the right); that's reinforced by the green light (thanks to third-order conditioning), which signals Step 3 (wings on hips); that's reinforced by the blue light (thanks to second-order conditioning), which signals Step 4 (knees in tight); and that is reinforced by the red light (thanks to first-order conditioning), which finally signals the last step of the dance.
We haven't talked about a control experiment for this, but our control would be something like the following:
Phase Stimulus (CS) Response Reinforcer (UCS)
1
red
light
pelvic
thrust
drink
2
blue
light
knees in
tight
green light
3
yellow
light
wings on
hips
white light
4
orange
light
step to
right
purple light
5
white
light
jump to
left
green light
In this control experiment, only the first step ought to be acquired. The secondary reinforcer from the first phase is never used in the later phases, and none of these is ever paired with a primary reinforcer. Indeed, based on work in discrimination training, we might predict that the other colors would become somewhat inhibitory (since they tend to signal absence of UCS).
But in real-world chains, of course, such individual discriminative cues and reinforcers do not always appear to be present (although you could argue that they are present in a dance in terms of the auditory stimuli represented in the music!). And we can solve that mystery by going back and analyzing responses as having stimulus components. Responses are also being associated with a UCS, so that doing a response can act as a secondary reinforcer! Thus, jumping to the left may be reinforced by stepping to the right, eliminating the need for all of these intervening light stimuli. If you thought our pigeon caught in a very awkward situation, you were right: By considering the stimulus components of a response, we find a way to make the concept of response chains a lot more realistic, and their execution smoother.
Response competition involves one response interfering with or competing with another. In fact, response competition is one of the theories regarding the process of extinction (see the chapter on partial reinforcement and extinction). The basic idea here is that the animal is being cued to perform incompatible responses.
An excellent example of response competition occurs in an experiment by Fowler and Miller. They trained rats to run to a goal box. During extinction, all of the rats were shocked on entering the goal box, but half of them were shocked on their front paws, and the other half were shocked on their rear paws. The animals shocked on their front paws jerked back, whereas the animals shocked on their rear paws jerked forwards. Moving forward is a response compatible with running into the goal box, but moving backwards is an incompatible response. Despite the fact that both groups received shock or punishment for entering the goal box, the front-paws group extinguished more rapidly. The new response caused by the shock in this case interfered with the old response.
Other examples of response competition come from work with humans in the verbal learning paradigm. Here, subjects are often asked to learn a list of word pairs, and tested on how successful they are at recalling the second word when presented with the first as a retrieval cue. So, if you studied a pair such as SHORT-LAKE, the experimenter might say SHORT, and you would need to reply with LAKE. As you may gather, we can identify the first word of a pair as the stimulus term, and the second as the response term. Numerous studies show interference when we ask people to learn several lists in which the same stimulus words are present, but there are different response words. Response competition will certainly not turn out to be the sole explanation of these findings (see, for example, Melton & Irwin, and Postman's review). But it assuredly handles some of what is going on, as we find intrusions of the earlier responses during learning of the later responses.
As for approach-avoidance conflicts, we may ask what happens when a response is associated with both an aversive and an appetitive outcome. That situation happens more frequently than you might think. In discrimination training, for example, we try to alter the excitatory generalization to the S- by associating it with lack of a reward. But, that means that the inhibition building up for S- may also generalize to the S+, canceling it out, to some extent (one of the explanations for peak shift). Thus, discrimination training involves two stimuli, each of which may be claimed to have some excitatory and some inhibitory components.
What ought to happen should thus reflect, in some sense, the summation of the excitation and inhibition, as was the case in use of the summation test in classical conditioning (see the discussion of algebraic summation theory in the chapter on attention and categorization for more details).
As an interesting footnote, Dollard and Miller tried to combine aspects of Freudian psychoanalytic theory and learning theory to describe some of the conflicts humans might be subject to. They identified several different types of conflicts, but one they termed an approach-avoidance conflict. In this situation, there is a tradeoff between the positive and negative components of making a response. As an example, we might take a rat running down an alleyway to obtain some food. Suppose that the goal box is associated both with food and with a shock. What will the rat do? One analysis of this situation (culled from several different studies) appears in Figure 2.
In this figure, we look at some measure of strength of a response as a function of how far from the goal the animal is. There are actually two opposed tendencies graphed in this figure: The tendency to approach the goal for a reinforcer, and the tendency to go away from the goal due to punishment. The solid line represents a typical, idealized avoidance gradient: The closer an animal is to an aversive or noxious stimulus, the more vigorously it leaves. As it gets further and further away, its response (running, for example) gets weaker and weaker. In contrast, the approach gradient graphed by the dotted line demonstrates the reverse finding: the closer to a desired reinforcement, the faster or more vigorously the animal approaches it.
Several additional
features
of Figure 2 are important. One is that the avoidance gradient is
typically
steeper than the gradient of approach, And the other is that in
this
figure, the lines cross. And because they do, we obtain an
approach-avoidance
conflict, with the spot at which the lines cross representing
the conflict point.
If you look to the right of the conflict point, you will see that approach is stronger than avoidance. Thus, right of this spot, the animal should tend to head towards the goal. But once it passes the conflict point and approaches, then avoidance becomes stronger, driving it back. So, the model predicts that an animal will waver around the conflict point, developing large amounts of frustration in the process. There will be some tendency here for the animal to simply escape this situation, if that is at all an option.
Finally, an increase or decrease in the amount of reinforcement or punishment in this model will essentially move the relevant gradient up or down. Increasing the punisher, for example, should move the solid line up, and that will result in the conflict point (the spot at which the lines cross) moving further away from the goal. In like fashion, increasing reinforcement moves the approach gradient up, causing the spot at which the lines cross to occur closer to the goal.
This is by no means all that Dollard and Miller have to say about what approach-avoidance conflicts entail. You may be interested in reading their book on personality and psychotherapy for more information.
We will look at partial reinforcement in more detail in a later chapter. For now, let me mention that a number of variables interact with partial reinforcement. In particular, amount of reinforcement will prove to play a pivotal role. In studies such as those conducted by Roberts, we find that animals that have been continuously reinforced will display increased resistance to extinction with small reinforcers. Roberts looked at extinction of alleyway running in rats whose reinforcers ranged from 1 to 25 food pellets. Over 36 extinction trials, there was little evidence of a change in the 1-pellet group, whereas the 25-pellet group was performing at well less than half their rate prior to extinction. However, this effect appears to depend on how much training an animal has had during the acquisition phase; it assumes a fairly substantial amount of acquisition (see D'Amato). In contrast, animals that have been partially reinforced will display increased resistance with large reinforcers (see, for example, Ratliff and Ratliff). The first result, in particular, strikes many people as counterintuitive on first coming across it. After all, shouldn't large reinforcers result in better learning, and shouldn't better learning be longer-lasting learning?
There are in fact a number of explanations for the partial reinforcement effect. For the moment, however, I will mention one to help you remember the results. This is Amsel's Frustration Hypothesis, cited in the first chapter. According to Amsel, continuously reinforced animals will experience more frustration when they lose a large reinforcer. And since frustration acts as an aversive stimulus, these animals will avoid whatever it is that is causing the frustration. So, with more frustration, there is faster learning of avoidance. But in contrast, animals in partial reinforcement are being trained to tolerate frustration. With larger reinforcers, they are trained specifically to handle greater and greater frustration. Thus, when they are placed in the highly frustrating situation of extinction (in which the expected reward fails to materialize), they will be better able to adapt to this situation.
Number of trials during acquisition also has different effects for continuously and partially reinforced groups. For continuously reinforced groups, more training results in lesser resistance, whereas for partially reinforced groups, more training results in greater resistance. The increased training in partially reinforced groups translates into better training to tolerate frustration, but the increased training in continuously reinforced groups translates into higher expectation of a reward (and thus, a ruder awakening when it is no longer there).
In short, extinction is
frustrating,
because expected rewards don't occur. How much resistance to extinction
you will have will thus depend partly on how much frustration you
experience
during extinction, and on how much frustration you have been
trained
to tolerate during acquisition. The amount of frustration
experienced
during extinction depends on the size of the reinforcer you expected.
(Not
getting an expected $50 is a lot more frustrating than not getting an
expected
$1.) In addition, continuously reinforced animals have not been trained
to tolerate any frustration whatsoever.
In appetitive or approach learning, the animal makes a response that results in a desired reward. This is the type of learning involving reinforcement that we have implicitly and explicitly discussed so far. But it is not the only paradigm based on reinforcement. Another that deserves particular note is omission training, in which an animal has to suppress or withhold a response in order to get its reward. Sheffield, for example, trained dogs to salivate in the presence of a tone associated with food, and then shifted them to omission training. In this latter phase, the dogs had to avoid salivating to the tone for several seconds to get the food. Omission training is initially typically difficult, and displays a relatively slow learning curve. However, there are several studies suggesting that in the long run, it will be as effective as extinction in decreasing the frequency of a response. Omission training is sometimes referred to as negative punishment to indicate that making the response is associated with removal of a reinforcer (which thus acts as a punishment).
Another paradigm based on reinforcement is escape learning. In escape learning, the animal learns a response that gets it away from punishment, either by turning off the punisher, or by allowing the animal to leave the area where the punishment was administered. Escape learning is closely associated with another paradigm, avoidance learning. In avoidance learning, the punishment is intermittent rather than continuous. If the animal makes the proper response before the punishment comes on, it will succeed in canceling that punishment. In avoidance learning, animals typically start out by escaping the aversive stimulation (making a response during the punishment that stops it), and then come to make the response early enough that they subsequently successfully avoid the aversive stimulation.
Punishment training (or aversive learning), of course, involves the administration of an unpleasant, aversive outcome following a response. Thus, punishment training, omission training, and extinction all have in common reducing the level of a given response, whereas appetitive learning, escape learning, and avoidance learning attempt to increase response level. There are some obvious interplays in paradigms here, depending on which response you focus on. Often, aspects of several different paradigms combine: One response may be punished while another is reinforced.
We may also distinguish between signaled and unsignaled learning. A discrete, distinct stimulus is present in signaled learning, but not in unsignaled learning. Thus, for example, in unsignaled avoidance, shocks can occur at regular intervals that could be avoided if the animal responds shortly before the shock's onset. There is no physical stimulus signaling the shock; the animal in this case needs to rely on an internal sense of time. In unsignaled conditions, features such as time or the contextual cues presumably act as stimuli.
Another paradigm, transfer training will prove important, especially when we focus on discrimination in a later chapter. In transfer training, we look at the effects of learning one task on another. Transfer might be nonexistent (zero), positive (facilitation: the learning is faster), or negative (inhibition: there is interference). In addition, transfer effects might be proactive (in which we look at the effect of an earlier task on the learning or performance of a later task), or retroactive (in which we saw how the later task influences performance on the earlier one).
A final paradigm involves shaping. Normally, approach learning applies to responses that are not especially frequent to start with, since we want to track an increase in frequency as one of our measures of learning. Thus, we find ourselves in the following situation: We sit in the lab, watching our animal subject, waiting for it to make the desired response so that we can administer the reinforcer.
Such a procedure will obviously be inefficient. In some cases (such as a pig rolling a coin), the wait may be very long indeed! Hence, a technology has developed that involves increasing the probability of having the animal emit that response so that we can then train it further through reinforcement. This technology, called shaping, requires reinforcing successive approximations to the desired response.
Shaping works as follows.
We start out by identifying a high-frequency component of the response
we want, and we reinforce that. So, if we want our rat to press a bar
on
the left side of an experimental chamber, then a high-frequency
component would involve having the rat be in the left half of
the
chamber. While it is exploring its environment, we reinforce for
crossing
over to the left. Then, as it increases its time on the left, we drop
the reinforcer. That will cause the behavior to become more variable.
We await some response yet closer to what we want to train (such as
being
near the bar), and when that occurs we reintroduce the
reinforcer.
And then, of course, we cycle the process through again in order to
obtain
yet a closer approximation (such as touching the bar). Shaping is a
very
powerful technique, not only because of its ability to 'coax' low
frequency
responses out of an animal, but also -- and especially -- because of
its ability to mold a response that is not normally part of the
animal's
repertoire! Thus, by combining shaping and chaining,
instrumental conditioning allows us to train totally new responses,
rather than just transfer stimulus control of an old response to a new
stimulus.
There are no discrete trials in operant conditioning, on the other hand. A standard apparatus for operant conditioning involves a Skinner Box, a chamber with something that can be manipulated (a key to peck; a bar to press; a lever to move); various discriminative stimuli that may be turned on or off (lights; noises); and means to automatically administer reinforcements or punishments (food or shock dispensers connected to the bar, for example). Particularly with respect to such simple responses as pressing a bar to obtain food, the interest will be more in how rapidly those responses are executed. We don't stop the animal between responses in order to set up another trial. Rather, we typically look at characteristics of response rate over time.
This distinction between
discrete and continuous trials might also be expressed in a
slightly
different manner. On a discrete trial, you can succeed only once (or
perhaps
8 times if we use an apparatus like Olton's radial maze,
discussed
in the previous chapter), whereas on continuous trials you have the
opportunity
to obtain virtually unlimited reinforcements. So, the difference
between
instrumental and operant conditioning in part involves whether there is
a constraint on how many reinforcing events an animal can seek out.
That
having been said, I will generally treat these as equivalent.
Before we do, however, we ought to note several features that make the situation a bit more interesting. First, of course, is the issue of partial reinforcement. We will delay fuller discussion of that to a later chapter. Second, there is the fact that in operant conditioning, an animal is effectively in charge of whether to emit the response or withhold it. Obviously, researchers in classical conditioning may easily arrange pairings of the CS and the UCS to achieve any desired contingency. But in operant conditioning, controlling how many times the reinforcer occurs when a response is emitted versus when a response is not emitted is clearly trickier. Third, because of the presence of three events (stimulus, response, outcome), there are three potential associations to worry about (S-R, S-O, and R-O). That means that we can ask about temporal contiguity (or contingency) not just of response and outcome, but also of stimulus and response, and of stimulus and outcome. The situation thus becomes significantly more complex.
Not all theorists believe that all three associations form. Thorndike, to remind you, accepted only an S-R association, as did Watson. But, researchers such as Rescorla have made a very strong case that the other associations are there, as well. Thus, Colwill and Rescorla used the devaluation paradigm on a reinforcer after the response had been acquired. If a reinforcer's only function is to stamp in the association (as claimed by Thorndike), devaluing the reinforcer ought not to influence the response the animal gives to the stimulus. In the abstract, the design for this type of experiment would be similar to the following:
Group Acquisition Phase Phase 2 Test Phase
experimental R to S for
RF
RF &
LiCl
R to S?
control
R to S for
RF
(Nothing)
R to S?
However, Colwill and Rescorla found a much less vigorous response following devaluation. This must have its effect on an R-O or S-O association.
What about an S-O association? From our discussion of chaining and higher-order conditioning, you already know that this association forms. Further evidence of this comes from the Rescorla study mentioned in the previous chapter, in which a stimulus that caused higher levels of responding during extinction became inhibitory, as measured by the summation and retardation tests. We had earlier read about a classical conditioning version of that study, but Rescorla also ran the same study with an instrumental conditioning set-up, and obtained the same results. Because the S-R association in these types of experiments is rapidly relearned following extinction while the S remains inhibitory, Rescorla claims that the inhibition doesn't involve the S-R link! And as a final example, consider a classic study by Seward and Levy on a phenomenon termed latent extinction. In their study, two groups of rats learned to run to a goal box for a reward. Following acquisition, one group had the experience of being placed directly in the goal without the reward. Then, both groups were put through extinction:
Group Acquisition Phase 2 Phase 3
experimental run for
RF
put in goal, no RF
extinction
control
run for RF
(Nothing)
extinction
In this experiment, the control group extinguished more slowly than the experimental group. Presumably, the stimulus elements of the goal box had now become associated with some inhibition for the experimental group, making their running to it less desirable.
Below, given the
theoretical
importance of reinforcement in operant conditioning, we will
concentrate
on principles having to do with its presence relative to the response.
A well-known experiment demonstrating the gradient of reinforcement was conducted by Grice. Grice used a choice discrimination paradigm in which rats had to enter one of two rooms or chambers. The rooms were different colors (black or white), and the rat was reinforced for entering one of these but not the other. However, there were several groups of rats who differed in terms of how long it took to get the reinforcer after choosing the correct color. All rats were immediately placed in a neutral-color room where the reinforcer was given, but one group received their reinforcer immediately, while others had to wait. The group with the longest wait was reinforced after 10 sec. Essentially, Grice found a very rapid fall-off of learning. After about 1 sec, there was no evidence that the discrimination had been learned.
Depending on the response and the circumstances (see the next major section below), longer delays in which learning still occurs have been reported. As an example, consider a study by Capaldi, in which two groups of rats were trained to run to a goal box. One group was rewarded as soon as it reached the goal box, but the other group had to wait 10 sec for its reward. Both indeed learned to run to the goal box, but the running speed (and the initial velocity out of the start box) was significantly depressed for the 10-sec delay group. Thus, their learning seems to have been affected by the delay.
Sometimes, a delay of 1 or 5 sec seems to result in no learning, and at other times (as in Capaldi's experiment), longer delays will be tolerated. Generally, however, the speed of learning as measured by vigor or probability of the response (or number of trials to acquire it) will be influenced by the response-reinforcer delay. Extrapolating from Skinner's claims, we may present one theory for why this is so: Namely, as the delay period increases, the odds increase that the animal will perform some other piece of behavior before the reinforcer is given. The association may then form between that response and the outcome, rather than between the effective response and the reward.
According to Skinner, temporal contiguity by itself is all that is needed for the formation of an association. Skinner cites the example of superstitious behavior to demonstrate this. In superstitious behavior, animals are reinforced at random, and need perform no response whatsoever. Yet, Skinner in one of his studies reported that pigeons in this circumstance were displaying apparently learned behaviors such as head shaking. He claimed that the reinforcer by dumb luck must have been presented just after the pigeon had tossed its head, so that head tossing was strengthened as a response in this situation. The increased possibility of acquiring superstitious behavior that interferes with other learning might thus partly explain why temporal contiguity is important.
From the perspective of a more cognitive, representational-level approach, we may posit a similar idea expressed in very different terms. Given the presence of a reinforcer, the animal's task is to determine which of a number of previous responses might be the one that worked. As the number of responses increases, the task becomes more difficult. Moreover, because causes normally result in relatively immediate effects (excepting, of course, situations such as illness or food poisoning: note the relevance to the taste aversions paradigm), organisms may be genetically predisposed to connect recent behavior with the current outcome (a principle of causal recency).
A similar principle, of course, applies to aversive situations. Fowler and Trapold in an experiment on escape learning varied how long it took for shock to turn off once the rat had run to a goal box. The best learning/performance occurred for a group of rats whose shock was turned off as soon as they entered the goal box. Animals that had to wait a bit for shock to turn off did worse.
Finally, Boe and
Church
found that the effectiveness of punishment decreased with delay. Unless
punishment is administered very shortly after an animal's response, it
will not prove very effective. Dog owners who come home and punish
puppies
for earlier 'accidents' are most likely to be associating themselves
with
the aversive outcome, and training fear of the owner and the spot where
the dog was punished. That is certainly not the same thing as
housebreaking
a pet.
The other two types of outcomes are negative reinforcement and negative punishment. It will help you to keep these straight by recalling that anything that is a labeled reinforcer, positive or negative, should operate by the law of reinforcement: It ought to increase the response that it follows. Similarly, anything labeled a punisher, positive or negative, ought to work by the law of punishment: It ought to decrease the response that it follows. That having been said, a negative reinforcer takes on its reinforcing properties because some response the animal makes results in removal of aversive stimulation. Negative reinforcement, of course, is the basis for escape learning. And in similar fashion, a negative punisher acquires its punishing properties by virtue of the fact that the animal makes a response leading to removal of a reward or privilege. Thus, positive outcomes involve the presentation of stimulus events, and negative outcomes involve the removal of certain stimulus events.
With respect to each, there appears to be a general principle that higher levels of strength result in stronger or faster or more vigorous responding, consistent with a claim that outcome strength influences speed of learning. Concerning positive reinforcement, for example, Kraeling taught three groups of rats to run an alleyway for a drink reinforcement that varied in the amount of sucrose concentration (recall that rats have a sweet tooth, so higher sucrose concentrations act as more effective reinforcers). Each group was given one trial per day for 99 days. At the end, they had each reached asymptote as measured by how fast they ran. However, the asymptotes differed for the three groups: The group with the highest sucrose concentration had the fastest asymptotic running speed whereas the group with the lowest concentration had the slowest speed. Crespi found similar results (see below, Figure 3): Rats given large amounts of reinforcement on each trial (64 pellets) showed faster running than rats given small amounts of reinforcement (4 pellets).
An experiment by Trapold and Fowler can illustrate the operation of this principle with amount of negative reinforcement. They conducted an experiment in which rats had to run to escape shock. Five groups of animals were given 20 trials of escape learning. The groups differed in the intensity of the shock (varying from 120 volts up to 400 volts). Faster acquisition of the escape response occurred with the larger shocks.
Finally, a classic experiment by Boe and Church may be used to illustrate the principle with positive punishment. Boe and Church trained four groups of rats to press a bar for a reward, and put each through extinction. Prior to extinction, however, three of these groups were put through punishment training in which, for 15 minutes, a bar press gave the animal a shock. The groups differed in intensity of the shock (35, 75, or 220 volts). Thus, the design was as follows:
Group Acquisition Phase Punishment Extinction Phase
1
RF for barpress (bp)
(None)
No RF for bp
2
RF for
barpress
bp --> 35 Volts No
RF for
bp
3
RF for
barpress
bp --> 75 Volts No
RF for
bp
4
RF for
barpress
bp --> 220 Volts No RF for bp
The question, of course, was how punishment of bar pressing would help speed up removal of that response. Over 9 sessions of extinction training, the group with the weak shock proved not all that different from the group with no punishment: Each engaged in a substantial number of responses during the course of extinction. However, quite different results occurred for the 75 and 220 volt groups: They showed a much lower level of responding during extinction. Indeed, the 220 volt group hardly responded at all! Thus, effectiveness of punishment in suppressing behavior will depend in part on severity of punishment. As the contrast between the control group and the 35 volt group demonstrates, weak punishers may have little permanent effect compared to extinction.
Of course, there are other variables that will influence the operation of an outcome. As you know from an earlier discussion, aversive stimulation can have the paradoxical effect of increasing the response it is meant to stamp out (vicious circle behavior). Also, the same amount of an outcome packaged in different ways may effectively act as different amounts. Thus, for example, Campbell, Batsche, and Batsche found that a reinforcer divided into smaller amounts worked better than a reinforcer presented as one large amount. And to remind you, those manipulations that seem to promote higher asymptotic levels during acquisition (in continuous reinforcement) generally also promote the fastest extinction.
There are also contrasts that may occur when an organism experiences several different levels of a reinforcement. An experiment by Crespi will illustrate these. Crespi trained rats to run to a goal box for food (the apparatus here involved a straight alleyway in which rats are released at one end of a corridor or tunnel, and have to run to the other end). One group was given a large reward, a second group was given a medium reward, and a third group was given a small reward. In each case, their running speed was measured. Then, the large-reward and small-reward groups were shifted to the medium reward. Thus, the design was something like the following:
Group Phase 1 (acquisition) Phase 2 (maintenance)
1
64-pellet
reward
16-pellet reward
2
16-pellet
reward
16-pellet reward
3
4-pellet
reward
16-pellet reward
You will note that I have labeled the two phases here acquisition and maintenance. The rats in the acquisition phase received 20 learning trials, and their average running speed at the end of training was measured. In a maintenance phase, on the other hand, we look at performance after learning has occurred (that is, presumably after the association has formed). In Crespi's study, there were 8 maintenance trials. Figure 3 presents the results after learning, and on the eighth maintenance trial. As you can see from this figure, Group 2 showed some slight increase; not surprising, since additional reinforced pairings ought to result in a stronger association according to most standard theories of instrumental conditioning. But notice what happened to the other two groups: A shift to a much smaller reward caused a negative contrast by which running speed slowed down considerably, whereas a shift to a much larger reward resulted in a corresponding increase (a positive contrast).
It is important to note that all groups received the same amount of
learning in Phase 2 (in terms of number of trials and what the
reinforcer
was). Thus, we might have expected each group to display the same
relative improvement. But, that did not happen. Because these contrasts
occurred during a post-acquisition period involving identical
additional
training, they are generally interpreted as demonstrating an effect not
on learning (or acquisition), but rather on performance.
That argument is particularly compelling for Group 1: They continued to
receive additional reinforced training during Phase 2, yet they
apparently
got worse! Contrasts of this sort are termed incentive contrasts.
Such contrasts should suggest that perhaps outcome amount is related more to an animal's motivation to perform a response than whether that response gets learned in the first place.
Contrast effects may
occur
under a variety of conditions (see, for example, Flaherty's
review).
They do tend to be temporary, however. Flaherty suggests, in
particular,
that negative contrasts may reflect frustration at obtaining the less
desired
reward. Consistent with this, tranquilized animals generally do
not exhibit contrasts.
To start, let us adapt the notion of a contingency space discussed in the previous chapter. Previously, we had looked at the relative probabilities of the UCS when the CS was present or absent. Now, we look at the relative probability of an outcome when the animal makes a response (Probability 1), or withholds it (i.e., does not make the response: Probability 2). Essentially, analogous to what happens in classical conditioning, many theorists will claim that when Probability 1 exceeds Probability 2, there ought to be excitation: The animal is more likely to make the response because the odds of getting a reward increase. (We are assuming rewards outcomes rather than punishers here!) In contrast, when Probability 1 is below Probability 2, then it makes more sense for the animal not to respond: The response ought to be inhibited. Finally, when the two probabilities are equal (so that there is zero contingency), we would expect to find no evidence of acquisition.
How well does this notion of contingency hold up? One interesting study that attempts to assess this notion of an operant contingency space was performed by Hammond. Hammond set up different contingencies between bar pressing and a reinforcer for several groups of rats. Whenever the two probabilities were equal, the rats failed to display any change in bar pressing. In these circumstances, there were always some pairings of the response with the outcome that ought to have resulted in the association forming, if Skinner's claims made in the section on superstitious behavior were correct. Indeed, we might regard such a circumstance as similar to a partial reinforcement schedule. Nevertheless, no evidence of learning occurred. In contrast, rats for whom Probability 1 exceeded Probability 2 did increase their bar pressing.
Moreover, in a follow-up, Hammond found that rats that had already acquired bar pressing stopped when the two probabilities were made equal. This pattern of findings certainly goes against Skinner's claims that contiguity is sufficient. Instead, it strongly suggests an exquisite sensitivity to contingency on the rats' part.
Why, then, do we obtain superstitious behavior? According to many people, Skinner has probably failed to consider the fact that reinforcers such as food also act as UCSs that elicit certain responses, and that the contextual cues act as a CS that gets conditioned to the UCS. On such an account, superstitious behavior really isn't: It is fairly straight-forward classical conditioning of the sort we have been discussing in the last two chapters.
An example may serve to drive this point home. One claimed example of superstitious behavior has been autoshaping. In autoshaping with pigeons some food is presented after a key lights up (Brown & Jenkins, 1968). After a while, pigeons peck the key, although they need not do so to obtain the food. Autoshaping is a very useful tool for training pigeons, because it seems that the pigeons will train themselves, and save you the work. But that avoids analysis of this situation as classical conditioning in which the lighted key serves as the CS. A nuanced discussion of how classical conditioning can partly explain some of the response characteristics (along with some of the problems for a simple stimulus substitution view of classical condition) may be found in Staddon and Simmelhag (1971). Consistent with their discussion, Jenkins and Moore (1973), for example, used either food pellets or drink as the reinforcer for different groups of pigeons in an autoshaping paradigm. Pigeons pecked at solid food with a closed beak, but opened the beak slightly to drink the liquid (you can see a demonstration of this on YouTube by clicking here). Thus, what appears to be contiguity without contingency in operant conditioning is sometomes a classical conditioning situation in which there is both contiguity and contingency (but the contingency involves the CS and the UCS, rather than a response and an outcome). On the other hand, Staddon and Simmelhag do point out that such terminal responses differ from the interim reposnses that can be produced before them, and that do seem to better fit with Skinner's notion of superstitious behavior.
Another experiment done with pigeons was performed by Killeen. The pigeons faced a horizontal array of 3 keys, the middle one of which was lit. They were trained to peck at this middle key. About 5% of the time, one of their pecks at the key would cause its light to go off, and the lights of the two surrounding keys to come on. Another 5% of the time, a computer would automatically turn off the center key while turning on the others. The question Killeen asked was whether the pigeon was aware that it was responsible for this change.
How can we assess a pigeon's knowledge of the circumstances? Killeen reasoned that a pigeon aware of whether its action had turned off the light would easily be able to learn another response that would depend on that action. So, the experiment was arranged so that the pigeon would get rewarded for pecking at one of the surrounding lights when it was responsible for turning them on, but would have to peck at the other light when it was the computer that had turned them on. That would be a difficult task to accomplish without sensitivity to contingency, but Killeen's pigeons came through. They did indeed show they could discriminate events caused by their own behavior from events caused by some other external cause.
Such sensitivity is not reported in all studies, however. In a quite clever study by Thomas, rats obtained random free reinforcements, but could also make a response (bar pressing) that would give them a reinforcement on demand. But the catch was, the number of random reinforcers would drop some after the rat made that response. In other words, more reinforcers were available if the rat did not respond. In contrast to the studies above, the rats in this experiment actually did learn to bar press, which resulted in their getting less food!
One more study on reinforcement-based contingency may be mentioned. This study, done by Watson (not the same Watson who redefined psychology as behaviorism!), used 3-months-old human infants. Watson set up a contingency between their turning their heads and a reinforcer of a mobile above their cribs turning for several seconds. A second group had the same experience of mobile reinforcer, but the second group's mobile movements had nothing to do with any of their responses. Although both groups initially displayed a great deal of interest and pleasure in the mobile when it started moving, only the contingent group maintained this reaction. Thus, Watson argued that the contingent infants had some sense of mastery over the mobile that the non-contingent infants did not, some awareness that the mobile's movements were due to their own actions. They were sensitive to contingency.
Contingency will also prove important in escape and avoidance learning. In particular, Seligman and Maier have studied a phenomenon termed learned helplessness. The experimental set-up for learned helplessness typically involves something like the following design:
Group Phase 1 Phase 2
Experimental
inescapable,
noncontingent
P
escape learning
Control
(Nothing)
escape learning
They find that unavoidable, non-contingent punishment results in the
animals in the Experimental Group not learning to escape, once a
contingency
is set-up between an escape response and avoidance of the shock. The
Control
Group, in contrast, readily acquires the response. According to
Seligman's
explanation of these results (the cognitive deficit hypothesis),
the animals in the Experimental Group have acquired a mistaken belief.
Based on the randomness of the shocks and their inability to escape
them
in Phase 1, they have mistakenly learned that there is no response
that
will be effective in avoiding or escaping shock. (You may want to
compare
this to various explanations for learned irrelevance in
classical
conditioning.) Thus, they cease trying to discover an effective
response,
so that learning is no longer attempted in Phase 2. Seligman has argued
that some similar mechanism in humans may account for certain episodes
of depression.
Leaving aside the taste aversions work, however, there are other studies suggesting relatively long delays are possible. Lieberman, Davidson, and Thomas, for example, presented a series of experiments in which pigeons had to peck the right or left side of a key. They found that some of their animal subjects were able to learn the response even with delays of 7 sec or longer (an extraordinarily long delay for a pigeon). The animals that were able to learn were the ones whose correct response was followed by an unusual event (a marker). In their experiment, the marker involved the key briefly turning a different color (from white to red on its left half and green on its right half) after it had been pecked. Other work by Lieberman and his colleagues has demonstrated that such marking can result in animals learning a discrimination even when the reinforcement is delayed a full minute. Since the marker involved a non-reinforced stimulus occurring after the relevant response but well before the reinforcement, the existence of a marking effect poses a challenge to the idea that temporal contiguity is always necessary for learning.
Indeed, this study ought to remind you a bit of some of the work we discussed regarding rehearsal and surprisingness (in particular, the work by Wagner, Rudy, and Whitlow and that of Hall and Pearce). Surprising events are apt to be rehearsed more. So, a distinctive surprising event following a response may result in that response being rehearsed for a longer period of time or becoming more distinct in memory (and thus more likely to be sampled as the cause of the reinforcement). Lieberman et al.'s take on this (the marking hypothesis) combines elements of both of the above (1985, p. 622):
[T]he effect of the marker proved to depend critically on what response preceded it: If a correct response was marked on food trials, then correct responding increased; if an incorrect response was marked, then incorrect responding increased. The most plausible explanation for this result, we believe, is that the marker triggered a memory search that focused attention on the preceding response, thereby increasing the likelihood that it would be remembered. [emphasis added]Another example of long-delay learning concerns a study by D'Amato, Sarafin, and Salmon. They delayed reinforcers by at least 30 minutes in training trials with rats. In one experiment, animals were placed in one of the two goal boxes of a T-maze (an apparatus that looks like a T, in which the animal runs from the start box at the base of the T to one of the two arms at the top), then put in the start box and fed 30 minutes later. Despite this delay, the animals exhibited differential running to the arm in which they had been placed. Note, in particular, that no additional events or stimulus cues were present during the wait in the start box that may have become associated with the food: Once the animal was let out of the start box, the stimulus cues around the correct arm may have primed the memory of being in that arm.
Finally, note too that the work we have already mentioned by Olton using the 8-arm radial maze suggests that rats are quite adept at finding food in the maze without retracing their steps, and without generally revisiting an already-visited arm. As they visit arms at random, they would appear to maintain some information in short-term memory concerning which responses have already been made. Given the length of time it takes to visit all 8 arms, this clearly qualifies as a type of long-delay learning.
Numerous mechanisms for long-delay learning have been proposed. One that plays off of the notion of secondary reinforcers has been proposed by Spence. This involves the mechanism of an anticipatory fractional goal response (rg). Note that the response in this instance is written with a lower-case r rather than an upper-case R. The reason is that the r is treated as one component or fraction of a more complex response, the goal response (Rg), the animal makes on reaching the goal and obtaining its reward. There will be numerous fractions or component responses such as chewing, swallowing, salivating, etc. These get conditioned to the stimulus cues present shortly before the animal enters the goal box. So, the association involves:
SGoalCues ----------> rg
But since these components or fractions are associated with food, they also become secondary reinforcers through higher-order conditioning. Thus, the cues present as the animal enters the goal area act as a reinforcer before the animal has actually received any food on that trial.
In addition, these components also have stimulus properties they are associated with, although these, of course are unlearned: Chewing, swallowing, etc., all cause certain physical sensations. So, the association ought also to include these, as follows (the dots indicate an unlearned association):
SGoalCues ----------> rg ..... sg
And as was true of the response fraction, we indicate the stimulus fraction with a lower-case s. The rg ..... sg is termed a mediator because it is a unit that may come between a stimulus and a response in a chain of associations.
In Spence's theory, these mediators become anticipatory; that is, they start being conditioned to earlier and earlier spots in a sequence. Thus, in a complex maze, the stimulus cues right before the cues that led to the goal box also take on secondary reinforcing properties. If we regard the goal cues as being at spot X, we will take the cues before these as being at spot X-1. Then, through classical conditioning we have:
SX-1 ----------> SGoalCues ----------> rg ..... sg
Or, to represent this by a shortcut:
SX-1 ----------> rg ..... sg
And if at this point the animal needs to make a left turn to get to the area of the goal box, then the associations at this point involve:
SX-1 ----------> rg ..... sg ----------> RLeft
And of course, we may now carry the procedure through to spot X-2 (the cues present before the X-1 cues). Thus, fractional goal responses are effectively conditioned throughout a complex chain in a process that should remind you of our example of the Time Warp. Consistent with this theory, animals do tend to learn a complex maze backwards (although not all results support the theory: In particular, researchers have not found evidence of anticipatory drooling at the various spots or choice points of a maze).
Thus, in theory, the presence of secondary reinforcers may help to bridge what appears to be a long delay. That would mean that long delays are really much shorter, since we need to assess the delay in terms of the first reinforcer present after a response. In this case, that first reinforcer will be a short-delay secondary reinforcer.
While secondary reinforcers and anticipatory fractional goal responses might account for some of the long-delay results, however, they cannot account for all of them. In particular, the study by D'Amato et al. would seem difficult to explain, since the animal is being fed in the start box, so that any secondary reinforcers ought first to be associated with it, rather than the arm the animal runs to. Similarly, Olton's results would not fall under this mechanism, because secondary reinforcers ought to become associated with an arm the animal has already visited, making it more likely the animal will revisit the same arm on the next trial. But that generally doesn't happen. And that it doesn't happen makes sense, according to foraging theory: An animal foraging in the wild for food is likely to deplete a food source, so there is adaptive value in searching for food in different locations. But, searching for food in this manner also requires a memory system that can keep track of where food was found previously, so as to avoid that spot.
Other factors that make
long-delay
learning possible include the presence of something to make the correct
response distinct, or the occurrence of little intervening
activity
between the effective response and the reward. We have already
discussed
distinctiveness in terms of Lieberman's marking hypothesis:
A distinct response is likely to be more salient, rehearsed more, and
thus
more readily available in memory when the occurrence of a reward
triggers
a search for events that might have been responsible for it. The notion
of little intervening activity will similarly play off of a memory
mechanism.
With little or no intervening activity, the last response will still be
the one most likely to be recovered from memory. But as activity
increases,
so do the possibilities of disruption (recall what Wagner, Rudy,
and
Whitlow found with post-trial episodes!), and choosing the
wrong
response as being the cause of the reward (response competition).
One of these studies, conducted by Shettleworth, involved an experiment with hamsters. Shettleworth identified six high frequency activities in hamsters that included face washing, digging, scent marking, hind leg scratching, rearing, and front paw scraping. When each of these was subsequently paired with a food reinforcer, only digging, rearing, and front paw scraping were affected. Such restriction of the operation of a reinforcer represents a violation of the requirement that reinforcers be transituational: Here, at least, are three responses that a reinforcer of food will not affect.
Another study illustrating belongingness comes out of the work of Premack. By allowing kids to play with gumball machines (dispensing candy) and pinball (game) machines, Premack identified kids who were players or eaters (based on the relative proportion of time they spent with each machine). He then set up an experiment using the following design:
Group Subjects Response & Reinforcer
1A
players
play to eat
1B
players
eat to play
2A
eaters
play to eat
2B
eaters
eat to play
Thus, there was now a contingency between responding on one machine, and responding on the other: In one case (play to eat), kids would have to increase their time on the pinball machine to get an opportunity to use the gumball machine; in the other (eat to play), the reverse was required: kids would have to increase responding to the gumball machine to get a shot at the pinball machine. Only groups 1B and 2A showed learning. Thus, what counts as an effective reinforcer for one child may be completely ineffective for another. (Similar results hold up for animals: see the next chapter).
Also as a potential
illustration
of belongingness, we might mention the fact that certain
responses
that are easy to acquire with positive reinforcement become very
difficult
to acquire with negative reinforcement. Pecking a key for
pigeons,
for example, is difficult to train with negative reinforcement (e.g., MacPhail).
The explanation for this latter result may have to do with Bolles's
theory of safety and danger signals. Negative reinforcement
involves
the presence of danger signals that trigger SSDRs. Such
responses
may well interfere with the desired response, particularly
if that desired response involves approaching the danger signal
or aversive stimulus! Such a notion is similar to a more general principle
of preparedness posited by Seligman: Responses may be
ordered
on a continuum ranging from prepared responses at one extreme
to
contraprepared responses at the other. Prepared responses are
responses
quite similar to what an animal would naturally do in a given
situation,
whereas contraprepared responses are those the exact opposite of what
the
animal would normally do (approach danger rather than flee from it, for
example). According to Seligman's principle, the closer a response is
to
the prepared end of the continuum, the easier it should be
learned.
So, the exact same response may be acceptable in one circumstance, but
not in another.
Group Treatment
1
no RF; removed when reach goal box
2
RF on each day when reach goal box
3
no RF until the 11th day
The question Tolman and Honzik asked was how the third group would perform on days 11 through 17. Since these days represented the first time this group had experienced reinforcement, a reinforcement-based account of learning would suggest that these animals started learning only on Day 11. But in fact, on the 12th day, these animals were performing as well as (in fact, slightly better than) the animals reinforced from the beginning (Group 2): They had learned to navigate the maze in the absence of a reinforcer. This finding, termed latent learning, suggests that reinforcement may be more important for performance (motivating an animal to show its knowledge) rather than acquisition.
Another similar result involves a study by Butler in which monkeys learned a response whose consequence involved being given access to a window looking out on a parking lot. While curiosity might be called a reinforcer, it seems a bit of a stretch in this case. The problem is that we have no way of independently identifying when learning would be expected to occur in the absence of any other reinforcer such as food, and when it would not. When is the animal curious?
A third study involves the area of observational learning. In a famous experiment by Bandura, kids watched a tape of a clown playing with toys. Children in the vicarious reinforcement group saw the clown being rewarded, but children in the vicarious punishment group saw the clown being punished for the way he played. Later, when these kids were given a chance to play with the same toys, the kids in the vicarious reinforcement group displayed the same behaviors: evidence that they had learned by watching. The kids in the punishment group played in a very different manner. But they had acquired the responses as well: When the experimenter asked them to show what the clown had done, they were able to do so. Thus, we find from this study that reinforcement and punishment may have an effect at a distance: Watching others be reinforced can serve as a reinforcement. Such a notion takes us far afield from the original idea of an appetitive stimulus that follows an emitted response (note that the children had not made the response themselves, and note also that all children had learned the response, though some of them had suppressed it until given permission by the experimenter to play the way the clown had played).
A final study on
observational
learning illustrates that the notion is not restricted to humans. Kohn
and Dennis taught one group of rats a choice discrimination.
A second group that were able to watch the training of the first group
learned the choice discrimination faster. Both the Kohn and Dennis
and the Bandura studies suggest a point we will explore in the
next
subsection; namely, that responses do not need to first be emitted in
order
for learning to occur.
The first is a study by McNamara, Long, and Wike. They looked at how long it took to train two groups of rats to learn a maze. One group, however, was initially placed one-by-one in a 'wagon' and dragged through the maze. This group did not perform any of the running or turning responses, but they did get to observe their environments. And as you have probably guessed, this group learned to run to the goal box (where they had been dragged) faster than the group without that experience. Presumably, while being dragged through the maze, they had opportunities to observe the various stimulus cues and form a representation of their environments (a cognitive map: see the discussion below and in the next chapter on Tolman's theory of learning). Learning to navigate through those environments was thus speeded up for this group.
The second study involved the acquisition of cognitive maps by chimpanzees. In this study, Menzel had animal subjects watch while food was hidden in slightly under 20 different locations in a large field. The animals were brought along as Menzel hid the food over the field. Subsequently, they were released at the starting point, and observed while they collected the food. Two findings are relevant here. First, they knew the locations of the food, despite having made no response themselves. And second, they did not collect the food in the same order as it was hidden. Thus, we would not want to claim that a path through the field (consisting of a chain of locations to visit) was learned (as may have been the case in the McNamara et al. study). They clearly did not imitate Menzel's path or chain in this instance. The results again seem to suggest that animals can acquire representations that are map-like, and that their learning will show up as an enhanced ability to successfully find food (rather than as the performance of a given order of responses).
Our third study was introduced earlier in this chapter. This is the study on latent extinction by Seward and Levy. To remind you, animals placed directly in a goal box in which they have previously been fed more rapidly extinguished running to that goal box than a group without this initial experience. Insofar as extinction is normally defined as requiring the process of making a non-reinforced response, both groups should have extinguished at the same rate: The initial experience of the experimental group did not involve making a non-reinforced running response! But that didn't happen. Presumably, through classical conditioning, the goal box had become associated with food, so that absence of food may have aroused frustration or inhibition. Thus, a change in the value of the outcome (a type of devaluation) altered its value for the experimental group, making learning of extinction easier.
Consider a study by Macfarlane. In this study, rats were trained to run a T-maze. Once they had acquired this response, Macfarlane flooded the maze and put the rats into the start box. They swam to the goal box that had previously been reinforced. The point of this study, of course, is that swimming and running technically involve different muscle movements. So, if Watson's view of learning were correct, we ought not to find evidence of learning when a different response is executed. But consistent with our discussion of cognitive maps, these animals had learned where to go: How to get there was not all that important.
A quite similar point occurs in studies with the use of the Morris maze. The Morris maze is a pool of opaque, milky-white liquid that has a platform somewhere underneath the water. The platform is close enough to the surface to enable a rat to keep its head above water without having to swim. Morris and his colleagues have found that rats released into the pool from the same spot eventually discover the platform (not having to expend energy on swimming is the reinforcer), and then learn to swim straight towards it. What is important from our perspective, however, is that these animals still head towards the platform when they are released from a new location: They are able to adjust their angle of swimming relative to the landmarks in the room that tell them in what direction the platform ought to be. That technically involves a different response then the one these animals made during acquisition. That they can execute the proper novel response again illustrates the involvement of cognitive maps in learning. What drives performance here is where to get to, and how to get there as soon as possible.
And indeed, people who study acquisition often report that animals will perform a number of different physical responses that appear equally effective in obtaining reinforcement. Rats need not (and will not) always press the bar with the same paw.
Finally, although it
takes
us slightly off of the focus of this section, we may also mention
studies
that show animals do not always prefer to make a response that has just
been reinforced (and that should therefore be relatively strong). On a
T-maze, for example, rats have a tendency to visit alternate arms
(e.g.,
Dember and Fowler). This should remind you of our discussion of
foraging theory. Similarly, Harlow has demonstrated that
primates can learn a win-shift lose-stay strategy in choice
discrimination
in which they have to select the non-reinforced stimulus on the
next trial. Responding to the previous S- and avoiding a
response
to the previous S+ ought to be difficult under normal associationist
assumptions.
Under foraging theory assumptions, it ought not to be that difficult.
The answer may surprise you. (And it will perhaps startle you to find that the answer doesn't depend on species: College-level humans have displayed the same result!) A first-guess common-sense theory most people come up with is that the animal (or human) will spend all of its time on the key that has the best value -- the green key. But this often does not happen. Instead, the animal (and the human) will distribute its responses in proportion to the reinforcements available. That is, it will peck all of the keys over a period of time, but will peck the green key proportionately more often than the red key, and the red proportionately more often than the blue.
Herrnstein has called this the matching law. To provide a simple formula for this law, let us assume we have a series of possible responses corresponding to a series of stimuli. In that case,
Responses to S1/Total Responses = RFs from S1/Total Available RFs
So, to calculate the proportion of times our animal spends with each key in the example above, we would calculate the following:
Proportion of Responses to SGreen = 5/(5+2+1) = 5/8 = .625
Proportion of Responses to SRed = 2/(5+2+1) = 2/8 = .25
Proportion of Responses to SBlue = 1/(5+2+1) = 1/8 = .125
That is, it should distribute 62.5% of its responses to the green key, 25% to the red key, and only 12.5% to the blue key.
The matching law applies generally whenever there is a difference in value of the reinforcer. We know that temporal contiguity can affect the value of a reinforcer, and a version of the matching law has been formulated for this situation, as well. But in this case, value depends on the reciprocal of the delay. A reinforcer that is given after a short delay has more value than one given after a long delay. So, given the same reinforcer presented at 2 and at 8 sec delays, its value would be 1/2 (.5) and 1/8 (.125), respectively. In this case, the matching law would have the following formula:
Responses to S1/Total Responses = value of RF from S1/Total Available values
Let us take another example. We'll again use a pigeon trained to peck at a red, blue, or green key. This time, the pigeon gets the same reinforcer from each, but at different delays: 2 sec for pecking at the red key, 4 sec for pecking at the blue key, and 8 sec for pecking at the green key. Before even doing any of the calculations, you ought to correctly predict that the red key will be pecked most (fastest reward), and the green key least (slowest reward).
But let's do the calculations. First, we need to calculate the values based on the delays. Remember that these are reciprocals. So, the values are:
red: 1/2 = .5 blue: 1/4 = .25 green: 1/8 = .125
Plugging these into our formula will yield the following results:
Proportion of Responses to SGreen = .125/(.125+.25+.5) = .125/.875 = .143
Proportion of Responses to SRed = .5/(.125+.25+.5) = .5/.875 = .571
Proportion of Responses to SBlue = .25/(.125+.25+.5) = .25/.875 = .286
You can see that the our predictions turn out to be correct. Specifically, our pigeon ought to distribute 57.1% of its responses to the red key, but 14.3% to green. (As a check on your calculations, by the way, note that the values ought all to add up to 100%!)
The matching law also applies to aversive outcomes. However, there is some evidence that it does not apply to all instrumental situations (see, for example, Allison). Some discrepancies have been found with fixed interval schedules, for example (see the chapter on extinction and partial reinforcement). Nevertheless, it illustrates complex processes that depend on comparing momentary values from multiple stimuli. Several theories have been proposed to explain the matching law. One of these, the melioration theory, claims that animals are assessing the momentary odds of a payoff. When one stimulus has paid off, then the animal works on the stimulus that is next most likely to pay off. Although the example differs somewhat, it is reminiscent of people playing multiple slot machines who shift to a different machine as soon as the machine they're on has scored.
The first is a study by Reynolds. Reynolds taught pigeons to peck at a key that included several different stimulus elements. Specifically, there was a white triangle against a red background on top of the reinforced key (as opposed to a circle against a green background for the non-reinforced key). In a later generalization test, Reynolds checked for how many pecks the pigeons would give to a completely red key, a completely green key, a triangular key, and a circular key. If the pigeons were under the control of the total stimulus complex (triangle & red), then we would expect significant generalization to both red and triangle, since they contain elements of the original training complex. Instead, she found that one pigeon pecked just to the red key, and the other pecked just to the triangular key. In this case, one element of the complex was the only effective or salient element; it completely overshadowed the other.
A second study that illustrates a similar phenomenon was conducted by Wagner, Logan, Haberlandt, and Price. They set up a design for two groups of animals that was something like the following:
Group Compound Stimulus RF Frequency
1
Light & Tone1
50%
Light & Tone2
50%
2
Light & Tone1
100%
Light & Tone2
0%
This study, of course, manipulates signal value. Thus, for the first group, the light has better signal value than either of the tones, because the light predicts more reinforcers (each tone by itself only predicts half the reinforcers the light does; you would have to attend to both tones to predict as many reinforcers as the light: two things to track rather than one). In contrast, the light is a worse predictor than Tone1 in the second group: paying attention to the light will work only half the time (as it does for Group 1), but paying attention to Tone1 will work all of the time. Wagner et al. find that the light does a more thorough job of controlling responding in the first group, despite the fact that it really predicts the same number of reinforcers in both (note the similarity to how we set up a blocking design in classical conditioning).
Can we get blocking in the traditional fashion? We ought to predict a blocking effect if instrumental conditioning operates the same way classical conditioning does. That is, given the following design:
Group Phase 1 Phase 2
1
RF for R to S1
RF for R to S1&S2
2
(Nothing)
RF for R to S1&S2
we would predict that the animals in Group 1 should block to S2, whereas the animals in Group 2 might be expected to show some conditioning to both S1 and S2. Thomas, Mariner, and Sherry used this type of design with pigeons, and obtained evidence of blocking.
Finally, consider a study by Lawrence and DeRivera. They used stimuli involving different shades of grey. In this study, animals had to perform one response if the darker shade was on top of a lighter shade, and the opposite response if the lighter shade was on top of the darker. During training, the bottom shade in all cases was a neutral grey.
The question Lawrence and
DeRivera asked was whether the animals were under control of just
the top color, or were comparing the top color with the bottom
color.
To make a long story short, they found that animals were essentially
comparing
the top shade to the bottom shade. When two of the lighter shades were
presented, for example, which response the animal performed depended on
whether the lighter of these shades was on the top or the bottom, even
though both shades had been associated with the same response during
acquisition.
In this case, responding was apparently based on a comparison rule such
as darker on top or lighter on top. Such comparative
sensitivity
is referred to as relational learning. It is incompatible with
the
claim that an association forms between independent stimulus elements
and
the response. That assumption forms part of many associationist models
such as Hull's or Spence's theories (see the next chapter), but it also
appears inconsistent with a model like the Rescorla-Wagner model,
in which each stimulus is treated independently of the others on a
conditioning
trial. Under certain circumstances, stimulus configurations are not
equal to the sum of their component parts.
For Skinner, there are essentially two broad categories of behavior: operants and respondents. Respondents cover behaviors that are reflexively coaxed out of the animal by the presence of specific stimuli. The presence of other stimuli at the same time enables them to acquire the ability to elicit these responses. Thus, he includes the work in classical conditioning under the category respondent. But operants are more characteristic of what most of us (and most higher animals) do: They are the pieces of behavior that an animal emits in a given situation.
The distinction between elicited and emitted behavior is an important one for Skinner. Emitted behavior is not triggered by a single stimulus link to that behavior, whatever it is. Operants are presumably influenced by a whole complex of events including the contextual stimuli, genetic constraints, and motivational factors having to do with the animal's state of hunger or thirst at the time. This is such a complex set of determiners that for all practical purposes, Skinner refuses to talk about which stimuli connect with which responses. Thus, contrary to most behaviorist theorists, Skinner does not talk about an association forming between a given stimulus and a given response, nor about whether and when that association strengthens or weakens. Rather, stimuli for Skinner serve something of the same function as occasion setters: They signal times at which a response-outcome contingency occurs.
We ought to take a moment to note that the notion of a response-outcome contingency in Skinner's work does not mean the same thing as an operant contingency space. This notion merely means that the experimenter has set up a condition whereby a given response will occasionally be followed by an outcome. Contiguity of the response and the outcome constitute the relevant learning mechanism, as we have seen in the previous discussion of superstitious behavior. But that is not to say that stimuli are irrelevant. When an outcome follows a response in the presence of one stimulus but not another, the animal learns a discrimination. The response comes under the control of the first stimulus (stimulus control), not in the sense that there is a triggering effect, but in the sense that the first stimulus becomes part of the entire stimulus situation in which responding alters.
We ought also to take out a moment to discuss this notion of a response. Although Skinner did use the term, it is in some sense odd, given the de-emphasis on any specific cause of the response. Given his view, it will not surprise you to learn that he did not insist that learning require the same physical response to increase or decrease in frequency (unlike Watson and Hull, who both claimed an association formed with specific muscular movements). So, many of the objections we looked at above to the learning of specific movements do not apply to Skinner. Instead, he adopted a functional definition of a response: any set of responses that achieve the same function qualify as the same response. Thus, if the function is to get to the left side of a T-maze, running to it, swimming to it, backing up to it, and casually crawfishing to it all count as the same response, because they all accomplish the same function.
Moreover, through shaping and chaining, complex operants followed by a single reinforcer are strengthened as a group, so that behavior may be constituted into quite long sequences. The sequence of getting out your car key, inserting it into the lock, unlocking the door, taking the key out, opening the door, getting in, closing the door, putting the key in the ignition, and turning it may all be reinforced by the motor coming on, but will all be punished by an engine that refuses to turn. If this seems a strange example to bring up, it isn't, really. Above all, Skinner was always concerned with the practical aspects of modifying behavior in the real world, and exploring how real-world contingencies affected behavior. Thus, in part as a pragmatist, he was concerned with what worked, and not with elaborate theories of why or how.
Perhaps because he was a pragmatist, he and his followers also tended to avoid experimental designs involving large groups of animals or people. His focus was on the individual, and whatever changes could be observed in the individual. Control that individual's behavior to some extent, and you have demonstrated sufficient explanation for why it occurs (since you are now able reliably to predict the presence or absence of that behavior).
Among his contributions was the study of partial reinforcement schedules (see Ferster and Skinner), behavior modification techniques and token economies, the notion of superstitious behavior, behavioral-level definitions of reinforcers and punishers (as opposed to the theoretical definitions we will see in the next chapter), work on secondary reinforcers and punishers, teaching machines, and the notion of negative reinforcement (which involves a different definition than the one I have used earlier: For Skinner, negative reinforcers are aversive events that operate as reinforcers by being removed. This is a subtle difference, but recall that we have defined negative reinforcement in terms of the removal of the event, not whether the event itself is aversive). Skinner, unlike Watson, was also happy talking about the conditioning of private events. And shortly before his death in 1990, he lambasted the current emphasis in American Psychology on cognitive models and memory systems. Ironically, his approach had become isolated from mainstream research at the same time that his behavior modification procedures had become a normal part of educational and clinical management techniques.
I don't think Skinner would have been disturbed by any research finding whatsoever. Since he refused to build formal theories, no finding could really have been inconsistent with his approach. In some sense, I think of Skinner as engaged in a process of cataloging or categorizing behavior: What are the situations under which this response increases? What are the situations under which it displays resistance to extinction? What mechanisms are effective for altering a response? How can we set our environments up to provide maximal efficiency? These were the issues that occupied him.
As an example of work inspired by Skinner's approach, we may consider a famous series of experiments by people like Verplanck and Greenspoon on verbal conditioning. They used a reinforcer of agreement (e.g., "uh huh" or even a pencil tap) and showed that people increased whatever it was that the "uh huh" followed: plural nouns rather than singular nouns, affective rather than descriptive statements, etc. Following acquisition, Greenspan put his subjects through extinction, and following that, questioned them to see if they had been aware of what was going on. He claimed they weren't. Thus, several theorists made the very strong claim that humans could easily be conditioned without their awareness (a claim Watson would have loved, of course).
Later theorists such as Dulany and Spielberger and DeNike challenged those studies. They pointed out a number of potential problems. One, for example, was that whatever people might have thought was going on would have been implicitly disconfirmed once extinction started, since they would now be collecting evidence against their hypothesis. Another was that a number of correlated hypotheses could have increased responding, but that these weren't included by the earlier experimenters. For instance, if you suspect you're being reinforced for mentioning species of dogs, then you may say "chihuahuas, collies, daschunds, terriers, pugs," etc. Note that these are plurals. But when the experimenter asks you what you believed the purpose of the experiment involved, you report that you were being stroked for coming out with dogs, which gets you coded (unfairly) as having shown conditioning without awareness. And finally, Dulany demonstrated that the people who showed verbal conditioning were those who at the time had a correlated hypothesis (i.e., were aware that something was going on, and had a theory that would result in increasing responses that the experimenter would count as correct by administering reinforcement).
I think a true Skinnerian
wouldn't have been much bothered by Dulany's or Spielberger and
DeNike's
results. Awareness for them could be defined operationally as a
series of answers to questions on a survey (much as Watson defined
emotions
as nothing more than certain behaviors like crying or shaking). Those
answers
constitute verbal behavior, as well. All a true Skinnerian need do in
this
situation is talk about the conditions under which one type of
verbal
behavior (performance on a survey) accompanies another
(conditioning).
I include this example because I want to give you a flavor of the
extraordinarily
different ways in which people interpret scientific research. To go
back
to the work by the philosopher Thomas Kuhn (mentioned
in
Chapter 1), Skinner's approach represents a completely different
paradigm.
And people in different paradigms can only rarely have useful
discussions
with one another about the foundational and philosophical assumptions
that
make science and the world meaningful for them. It is a bit like
arguing
religious beliefs.
Tolman was a theorist, in contrast to Skinner. He build models around the notion of intervening variables that came between a stimulus and a response. These variables in large part involved cognitions: beliefs, expectancies, desires, and knowledge. The argument that he and others who have used intervening variables make is that theories with such variables are more successful in their predictions than those without. He didn't worry about Watson's dictum that private events were illegitimate in a science of psychology, since the proof of the legitimacy of the concept for Tolman was its track record. Tolman was the precursor of people like Bandura who built models around observational learning, and more generally, of the cognitive revolution that occurred in American psychology in the 1960s.
Several cognitions were particularly important in Tolman's work. One of these we have already met: the notion of a cognitive map. According to Tolman, animals observing, exploring, and experiencing their environments would come to have representations of the lay-out of those environments. Thus, he performed experiments in which he showed that animals would illustrate they had learned an environment once properly motivated to do so (e.g., the Tolman and Honzik experiment on latent learning discussed earlier), and that they knew how to get around obstacles and take novel shortcuts, when their normal routes were no longer available. We can discuss such learning in terms of connections of stimulus (S-S) associations, but it was clearly observational learning, in many of Tolman's studies.
Another cognition that was quite important (and that prefigured many modern theories of learning) was the notion of an expectancy or expectation: a belief that some event ought to occur in some situation based on past experiences. In discussing the situations found in instrumental and classical conditioning, Tolman provided examples of two types of expectancies. One may be written as follows:
Ej: S1 -----> S2
This may be read as stating the content of expectancy Ej: That expectancy tells us that when S1 occurs, S2 may be expected to follow. If we substitute the CS for S1 and the UCS for S2, we see that we obtain a situation corresponding to classical conditioning. But this situation extends far beyond classical conditioning. It may also explain how we build cognitive maps: we learn that this part of the route is normally succeeded by this other part.
As for instrumental conditioning, the expectancies may be given as below:
Ek: S3 Ra -----> S4
El: S3 Rb -----> S5
These two expectancies, Ek and El (the subscripts are just to keep them separate; we have a huge number of expectancies in which these stimuli are specified rather than indicated through abstract mathematical variables), basically state that in the presence of stimulus S3, one response (Ra) will lead to stimulus S4, and the other response (Rb) will result in a different stimulus. If you view these latter stimuli as rewards or punishers, then you obtain the expectancies that account for approach or avoidance.
We can now add the notions of value and valence to account for what an animal will do. The value of an expectancy has to do with the strength of its terminal stimulus. If we temporarily regress to speaking of reinforcers and punishers, there are strong and weak reinforcers, just as there are strong and weak punishers. As you might expect, the strong ones are of greater motivational value than the weak ones. As for whether something is positive or negative, this involves the notion of its valence or sign.
Given this, we can now state some simple rules for what an animal will do in any given situation. One is that given a choice between two positive valence responses, the animal will choose the stronger. As an example, consider the following expectancies in an experiment with monkeys:
Ek: Tone - Lift White Cup -----> find banana chip
El: Tone - Lift Blue Cup -----> find piece of lettuce
Here, we presume the animal is faced with a choice involving two down-turned cups. Each has a reinforcer hidden beneath it, and the animal may choose the reinforcer underneath one of the cups. From past experience, it has learned that the white cup hides a banana chip, and the blue cup hides a lettuce leaf. Banana chips are stronger reinforcers: They are high-value positive-valence outcomes. Thus, our principle states the animal ought to choose the white cup. In common words, choose the better of two goods.
Our second rule will involve negative valences. We have a rat in a chamber that may leave by one of two doors. It is being shocked in the chamber, so there is every reason to leave. If it goes through the north door, the shock reduces by half, and if it goes through the south door, the shock reduces by a fourth. In this case, the principle is choose the weaker of two negative-valence outcomes. Or in plain English, if you have to, go for the lesser of two evils.
That gives you a bit of a taste of Tolman's theory. We will talk about several more relevant studies later. Tolman and Hull (see the next section), in particular, constantly chased one another's experiments and theories, arguing about whether the results suggested the need for cognitive factors or not. Interestingly enough, they both utilized intervening variables, although Hull's system was far more developed and organized than Tolman's. But Tolman had a gift for finding the weakness in one of Hull's claims, and doing an experiment that would seem to demonstrate a result the exact opposite of what Hull predicted. That in part was the case with the latent learning study, since Hull's model allows formation of an association only with a very special type of reinforcing event called a drive reduction. But in latent learning, no such event occurred on the first 10 trials for the group running without a reinforcer.
It's hard to imagine two
approaches more different, and yet in some respects similar, than
Tolman's
and Skinner's. They were both concerned with large-scale behavior
rather
than the minute muscle movements of Hull's system. But Tolman freely
speculated
on intervening variables while Skinner loathed them. One placed the
cause
of behavior squarely within a cognitive or representation-level
approach,
but the other kept as close to a behavioral-level approach as was
possible.
And the specific details of each theorist's approach were generally
ignored
by most people, although each had enormous influence on subsequent work
or theoretical approaches. Indeed, one of Tolman's students, Krechevsky,
developed the notion that animals during learning test hypotheses
about which stimulus element they are supposed to notice. As we will
see
in a later chapter, this notion evolved into modern-day attentional
theories
of discrimination learning.
Several theorists have tried to answer this question from different perspectives. One approach simply attempts to cut the Gordian knot by applying similar models to each. In a later chapter in which we examine theories of discrimination learning, we will come across attentional and rehearsal concepts that will remind you of the corresponding models in classical conditioning. There have been some attempts, for example, to modify the Rescorla-Wagner model to handle the strength of instrumental learning by treating the S and the R as CSs in compound conditioning, and the RF as the UCS (see, for example, Wasserman, Elek, Chatlosh, and Baker). We have already seen some of the predictions regarding blocking and overshadowing that would naturally arise from application of such a model.
Another approach is more direct: If Skinner is correct in identifying these two types of learning as really involving different types of muscle systems (voluntary versus involuntary muscles), then we ought not to be able to instrumentally condition involuntary reflexes. However, there is now quite a lot of work in the area of biofeedback on different species (including humans) that demonstrates modifying involuntary responses through operation of instrumental reinforcers is feasible (see, for example, Miller).
Whether the same models ultimately apply to these two areas or not, do note that there will always be the other type of learning occurring whenever you train instrumental or classical conditioning. Reinforcers and punishers are also significant biological events that act as UCSs, so that instrumental conditioning using outcomes should generally include some aspects of classical conditioning. Similarly, biologically significant UCSs in classical conditioning may act as outcomes influencing the responses the animal makes prior to their presentation.
Learning in the real
world
doesn't always occur in neat packets that allow one type of association
to form, and not another.
Allison, J. (1989). The nature of reinforcement. In S.B. Klein & R.R. Mowrer (Eds.), Contemporary learning theories: Instrumental conditional theory and the impact of biological constraints on learning (13-39). NJ: Erlbaum.
Badia, P., & Culbertson, S. (1972). The relative aversiveness of signalled vs. unsignalled escapable and inescapable shock. Journal of the Experimental Analysis of Behavior, 17, 463-471.
Badia, P., Culbertson, S., & Harsh, J. (1973). Choice of longer or stronger signalled shock vs. shorter or weaker unsignalled shock. Journal of the Experimental Analysis of Behavior, 19, 25-32.
Bandura, A. (1965). Influence of models' reinforcement contingencies on the acquisition of imitative responses. Journal of Personality and Social Psychology, 1, 589-595.
Boe, E.E., & Church, R.M. (1967). Permanent effects of punishment during extinction. Journal of Comparative and Physiological Psychology, 63, 486-492.
Bolles, R.C. (1970). Species-specific defense reactions and avoidance learning. Psychological Review, 77, 32-48.
Breland, K., & Breland, M. (1961) The misbehavior of organisms. American psychologist, 16, 681-684.
Brown, J.S., Martin,R.C., & Morrow, M.W. (1964). Self-punitive
behavior
in the rat: Facilitative effects of punishment on resistance to
extinction.
Journal of Comparative and Physiological Psychology, 57, 127-133.
Brown, Pl., & Jenkins, H.M. (1968). Auto-shaping of
the pigeon's key peck. Journal
of the Experimental Analysis of Behavior, 11, 1-8.
Butler, R.A. (1953). Discrimination learning by rhesus monkeys to visual exploration motive. Journal of Comparative and Physiological Psychology, 46, 95-98.
Campbell, P.E., Batsche, C.J., & Batsche, G.M. (1972). Spaced-trials reward magnitude effects in the rat: Single versus multiple food pellets. Journal of Comparative and Physiological Psychology, 81, 360-364.
Capaldi, E.J. (1978). Effects of schedule and delay of reinforcement on acquisition speed. Animal Learning and Behavior, 6, 330-334.
Colwill, R.M., & Rescorla, R.A. (1985). Postconditioning devaluation of a reinforcer affects instrumental responding. Journal of Experimental Psychology: Animal Behavior Processes, 11, 120-132.
Crespi, L.P. (1942). Quantitative variation in incentive and performance in the white rat. American Journal of Psychology, 55, 467-517.
Daly, H.B. (1974). Reinforcing properties of escape from frustration aroused in various learning situations. In G.H. Bower (Ed.), The psychology of learning and motivation (Vol 8, 187-231). NY: Academic.
D'Amato, M.R. (1970). Experimental psychology: Methodology, psychophysics, and learning. NY: McGraw-Hill.
D'Amato, M.R., Sarafin, W.R., & Salmon, D. Long-delay conditioning and instrumental learning: Some new findings. In N.E. Spear and R.R. Miller (Eds.), Information processing in animals: Memory mechanisms (113-142). NJ: Erlsbaum.
Dember, W.N., & Fowler, H. (1958). Spontaneous alternation behavior. Psychological Bulletin, 55, 412-428.
Dollard, J.C., & Miller, N.E. (1950). Personality and psychotherapy. NY: McGraw-Hill.
Dulany, D. E. (1968). Awareness, rules, and propositional control: A confrontation with S-R behavior theory. In T.R. Dixon & D.C. Horton (Eds.), Verbal behavior and general behavior theory. NJ: Prentice-Hall.
Dwyer, D.M., Mackintosh, N.J., & Boakes, R.A. (1998). Simulatneous activation of the representations of absent cues results in the formation of an excitatory association between them. Journal of Experimental Psychology: Animal Behavior Processes, 24, 163-171.
Egger, M.D., & Miller, M.E. (1963). When is a reward reinforcing? An experimental study of the information hypothesis. Journal of Comparative and Physiological Psychology, 56, 132-137.
Ferster, C.B., & Skinner, B.F. (1957). Schedules of reinforcement. NY: Appleton-Century-Crofts.
Flaherty, C.F. (1982). Incentive contrast: A review of behavioral changes following shifts in reward. Animal Learning and Behavior, 10, 409-440.
Fowler, H., & Miller, N.E. (1963). Facilitation and inhibition of runway performance by hind- and forepaw shock of various intensities. Journal of Comparative and Physiological Psychology, 56, 801-806.
Fowler, H., & Trapold, M.A. (1962). Escape performance as a function of delay of reinforcement. Journal of Experimental Psychology, 63, 464-467.
Garcia, J., Ervin, F.R., & Koelling, R.A. (1966). Learning with prolonged delay of reinforcement. Psychonomic Science, 5, 121-122.
Greenspoon, J. (1955). The reinforcing effect of two spoken sounds on the frequency of two responses. American Journal of Psychology, 68, 409-416.
Grice, G.R. (1948). The relation of secondary reinforcement to delayed reward in visual discrimination learning. Journal of Experimental Psychology, 38, 1-16.
Guthrie, E.R. (1952 ). The psychology of learning. (Revised edition) NY: Harper & Row.
Guttman, N., & Kalish, H.I. (1956). Discriminability and stimulus generalization. Journal of Experimental Psychology, 51, 79-88.
Hammond, L.J. (1980). The effect of contingency upon the appetitive conditioning of free operant behavior. Journal of the Experimental Analysis of Behavior, 34, 297-304.
Hanson, H.M. (1959). Effects of discrimination training on stimulus generalization. Journal of Experimental Psychology, 58, 321-334.
Herrnstein, R.J. (1970). On the law of effect. Journal of the
Experimental
Analysis of Behavior, 13, 243-266.
Jenkins, H.M., & Moore, B.R. (1973). The form of the autoshaped response with food or water reinforcers. Journal of the Experimental Analysis of Behavior, 20, 163-181.
Killeen. P. (1978). Superstition: A matter of bias, not detectability. Science, 199, 88-90.
Kohn, B., & Dennis, M. (1972). Observation and discrimination learning in the rat: Specific and nonspecific effects. Journal of Comparative and Physiological Psychology, 78, 292-296.
Kraeling, D. (1961). Analysis of amount of reward as a variable in learning. Journal of Comparative and Physiological Psychology, 54, 560-565.
Krechevsky, I. (1932). "Hypotheses" in rats. Psychological Review, 39, 516-532.
Kuhn, T.S. (1970). The structure of scientific revolutions (second edition). Chicago: University of Chicago Press.
Lawrence, D.H., & DeRivera. J. (1954). Evidence for relational transposition. Journal of Comparative and Physiological Psychology, 47, 465-471.
Lieberman, D.A., Davidson, F.H., & Thomas, G.V. (1985). Marking in pigeons: The role of memory in delayed reinforcement. Journal of Experimental Psychology: Animal Behavior Processes, 11, 611-624.
Macfarlane, D.A. (1930). The role of kinesthesis in maze learning. California University Publication Psychology, 4, 277-305.
MacPhail. E.M. (1968). Avoidance responding in pigeons. Journal of the Experimental Analysis of Behavior, 11, 625-632.
McNamara, H.J., Long, J.B., & Wike, F.L. (1956). Learning without response under two conditions of external cues. Journal of Comparative and Physiological Psychology, 49, .
Melton, A.W., & Irwin, J.M. (1940). The influence of degree of interpolated learning on retroactive inhibition and overt transfer of specific responses. American Journal of Psychology, 53, 173-203.
Menzel, E.W. (1978). Cognitive mapping in chimpanzees. In S.H. Hulse, H. Fowler, & W.K. Honig (Eds.), Cognitive processes in animal behavior (375-422). NJ: Erlbaum.
Miller, N. E. (1978). Biofeedback and visceral learning. Annual Review of Psychology, 29, 373-404.
Morris, R.G.M., Garrud, P., Rawlins, J.N.P., & O'Keefe, W. (1982).
Olton, D.S. (1978). Characteristics of spatial memory. In S.H. Hulse, H. Fowler, & W.K. Honig (Eds.), Cognitive processes in animal behavior (341-373). NJ: Erlbaum.
Postman, L. (1974). Transfer, interference, and forgetting. In J.W. Kling and L.A. Riggs (Eds.), Experimental psychology. NY: Holt, Rinehart, & Winston.
Premack, D. (1959). Toward empirical behavior laws: I. Positive reinforcement. Psychological Review, 66, 219-233.
Ratliff, R.G., & Ratliff, A.R. (1971). Runway acquisition and extinction as a joint function of magnitude of reward and percentage of rewarded acquisition trials. Learning and Motivation, 2, 289-295.
Rescorla, R.A. (1997). Response-inhibition in extinction. Quarterly Journal of Experimental Psychology. B. Comparative and Physiological Psychology, 50B, 238-252.
Reynolds, G.S. (1961). Attention in the pigeon. Journal of the Experimental Analysis of Behavior, 4, 57-71.
Roberts, W.A. (1969). Resistance to extinction following partial and consistent reinforcement with varying magnitudes of reward. Journal of Comparative and Physiological Psychology, 67, 395-400.
Seligman, M.E.P. (1970). On the generality of laws of learning. Psychological Review, 77, 406-418.
Seligman, M.E.P., & Maier, S.F. (1967). Failure to escape traumatic shock. Journal of Experimental Psychology, 74, 1-9.
Seward, J.P., & Levy, N. (1949). Sign learning as a factor in extinction. Journal of Experimental Psychology, 39, 660-668.
Sheffield, F.D. (1965). Relation between classical conditioning and instrumental learning. In W.F. Prokasy (Ed.), Classical conditioning: A symposium (302-322). NY: Appleton-Century-Crofts.
Shettleworth, S.J. (1975). Reinforcement and the organization of behavior in golden hamsters: Hunger, environment, and food reinforcement. Journal of Experimental Psychology: Animal Behavior Processes, 1, 56-87.
Skinner, B.F. (1938). The behavior of organisms: An experimental analysis. NY: Appleton-Century-Crofts.
Skinner, B.F. (1964). Behaviorism at fifty. In T.W. Wann (Ed.), Behaviorism and phenomenology: Contrasting bases for modern psychology (79-108). Chicago: U. Chicago Press.
Spence, K.W. (1947). The role of secondary reinforcement in delayed reward learning. Psychological Review, 54, 1-8.
Spielberger, L.D., & DeNike, L. (1966). Descriptive behaviorism versus cognitive theory in verbal operant conditioning. Psychological Review, 73, 306-326.
Staddon, J.E.R., & Simmelhag, V.L. (1971). The "superstition" experiment: A reexamination of its implications for the principles of adaptive behavior. Psychological Review, 78, 3-43.
The Rocky Horror Show. (1975). Original Australian Cast Album. Elephant Records. (Yep: It's better Rock & Roll than the later American album based on the movie version, in my opinion...)
Thomas, G. (1981). Contiguity, reinforcement rate, and the law of effect. Quarterly Journal of Experimental Psychology, 33B, 33-43.
Thomas, D.R., Mariner, R.W., & Sherry, G. (1969). Role of pre-experimental experience in the development of stimulus control. Journal of Experimental Psychology, 79, 375-376.
Thorndike, E.L. (1898). Animal intelligence: An experimental study of the associative processes in animals. Psychological Review Monograph Supplements 2, no. 8.
Thorndike, E.L. (1913).The psychology of learning. NY: Columbia University.
Tolman, E. C. (1932). Purposive behavior in animals and men. NY: Century.
Tolman, E.C., & Honzik, C.H. (1930). Introduction and removal of reward and maze performance in rats. University of California Publications in Psychology, 4, 257-275.
Trapold, M.A., & Fowler, H. (1960). Instrumental escape performance as a function of the intensity of noxious stimulation. Journal of Experimental Psychology, 60, 323-326.
Verplanck, W.S. (1955). The operant, from rat to man. Transactions of the New York Academy of Sciences, Series 11, 17 (8), 594-601.
Wagner, A.R., Logan, F.A., Haberlandt, K., & Price, T. (1968). Stimulus selection in animal discrimination learning. Journal of Experimental Psychology, 76, 171-180.
Wasserman, E.A., Elek, S.M., Chatlosh, D.C., & Baker, A.G. (1993). Rating causal relations: Role of probability in judgments of response-outcome contingency. Journal of Experimental Psychology: Learning, Memory, and Cognition, 19, 174-188.
Watson, J.B. (1913). Psychology as the behaviorist views it. Psychological Review, 20, 158-177.
Watson, J.B. (1926 Excerpts from "What the nursery has to say about instincts." In C. Murchison (Ed.), Psychologies of 1925. NY: Clark U. Press.
Watson, J.B. (1930). Behaviorism. NY: Norton.
Watson, J.B., & Rayner, R. (1920). Conditioned emotional reactions. Journal of Experimental Psychology, 3, 1-14.
Watson, J.S. (1967). Memory and "contingency analysis" in infant
learning.
Merrill-Palmer Quarterly, 12, 55-76.
Animal Cognition Page: (http://www.pigeon.psy.tufts.edu/psych26/history.htm)
(note particularly the links to Thorndike, Tolman, and Skinner's stuff. There's a graph of Thorndike's results with puzzle-box learning that you should look at.)The Behaviorist Manifesto: (http://www.yorku.ca/dept/psych/classics/Watson/views.htm)
(Watson's classic paper)Emotional Conditioning: ( http://www.yorku.ca/dept/psych/classics/Watson/emotion.htm)
(The Watson & Rayner paper reporting on their study with Little Albert)Cognitive Maps: (http://www.yorku.ca/dept/psych/classics/Tolman/Maps/maps.htm)
(A paper by Tolman reviewing some of the cognitive map studies)(Note: The three papers above are from the Classics in the History of Psychology webpage; you have a link for it in Chapter 1)