Instrumental or Operant Conditioning
Early work began with Edward L. Thorndike and his Law of Effect
§Cats in a puzzle-box
§Trial-and-error learning
Thorndike’s Puzzle-Box
Thorndike's Description of a Cat in the Puzzle-Box
"When put into the box the cat would show
evident signs of discomfort and of an impulse to escape from confinement. .. tries to squeeze through any
opening. . . claws and bites the bars or wire. . . claws at everything it
reaches. . . For eight or ten minutes it will claw and bite and squeeze
incessantly. . . . Whether the impulse to struggle be due to an instinctive
reaction to confinement or to an association, it is likely to succeed in
letting the cat out of the box. The cat that is clawing all over the box in her
impulsive struggle will probably claw the string or
loop . . . so as to open the door. And gradually all the other non-successful
impulses will be stamped out and the particular impulse leading to the
successful act will be stamped in by the resulting pleasure, until, after many
trials, the cat will, when put in the box, immediately claw the button or loop
in a definite way."
Thorndike's Law of Effect
"Of several responses made to the same
situation those which are accompanied or closely followed by satisfaction to
the animal will, other things being equal, be more firmly connected with the
situation, so that, when it recurs, they will be more likely to recur; those
which are accompanied or closely followed by discomfort to the animal will,
other things being equal, have their connections to the situation weakened, so
that, when it recurs, they will be less likely to occur. The
greater the satisfaction or discomfort, the greater the strengthening or
weakening of the bond."
What Was Being Learned?
§What the animal learned was an association between a
stimulus (total situation) and a response (motor reaction), i.e., an S-R bond
§Note that Thorndike inferred internal states
(drives, impulses) to account for cat’s behavior
§The selection, narrowing down, ‘stamping in’ of
responses that were initially internally actuated, or perhaps even ‘wired in,’
was guided by external stimuli
Law of Effect and Survival of the Fittest
§Note the similarity between Thorndike’s Law of
Effect
§“Any act which in a given situation produces
satisfaction becomes associated with that situation, so that when the situation
recurs, the act is more likely than ever to recur also.”
§ and the law of survival of the fittest in
evolutionary theory
§Only the fittest organisms survive
§Only the fittest responses shall survive, i.e.,
those leading to pleasure or satisfaction
Primary and Secondary Reinforcers
§According to Thorndike and others who followed his
lead, some reinforcers are learned
§Such secondary (or conditioned) reinforcers
acquire their reinforcing power by virtue of their association with primary
(unlearned) reinforcers (e.g., food, water,
termination of electric shock)
§Example. Chimps with ‘Chimp-O-Mat’ (Cowles, 1937)
Types of Reinforcement and Punishment
Time and the Law of Effect
§Cat performs many responses before the correct one
so that the reward occurs after all (even the incorrect) responses. That is, all
responses are reinforced.
§According to Law of Effect all responses
should be strengthened
§Addition to Law of Effect
§A reinforcer becomes less
effective the longer it is delayed after the response
Conditioning and Behaviorism
§Out of Pavlov’s and Thorndike’s work arose behaviorism
or ‘behavior theory’
§B.F. Skinner called instrumental conditioning
‘operant conditioning’
§His form of behaviorism focused solely on observable
behavior (no internal states)
§Attempted to analyze factors that ‘control’
(determine) behavior
§Focused on ‘rate of responding’ and attempted to see
how this is influenced by environmental and reinforcement contingencies
Skinner’s Approach
§Skinner’s focus on purely observable behavior leads
him to see
§behavior as simply emitted (operant) rather than
elicited
§reinforcement as something that simply increases the
probability of a behavior rather than as something pleasurable or satisfying (unobservable)
Findings from Operant Conditioning Studies
§Main findings come from studies on rats and pigeons
placed in Skinner Boxes
§Animals are trained to perform behaviors by a
process called shaping
§Shaping
is the differential reinforcement of successive approximations to the desired
behavior
§Much research in operant conditioning looks at
effects of different schedules of reinforcement on rate of responding
Skinner Box
Reinforcement Schedules
Four main
types
§Fixed ratio (FR2, FR50, etc.): Reinforce every 2nd
or every 50th response, etc.
§Variable ratio: Reinforcement comes after a certain number of
responses but the number varies irregularly around some mean, e.g., VR 50
§Fixed interval: Based on time, say, every 10 seconds. No amount of
responding before 10 sec are up will lead to Rf, but
the 1st response after 10 sec is reinforced
§Variable interval: Time interval varies irregularly around some
average, say, VI 25
Typical Reinforcement Schedule Patterns
Additional Reinforcement Effects
§Continuous reinforcement (= FR1) leads to fastest acquisition of learned
response, but response is easily extinguished
§Partial reinforcement (not every response is reinforced) leads to learned
responses highly resistant to extinction
§Variable schedules lead to learned responses more
resistant to extinction than fixed schedules
Applications of Classical and Instrumental Conditioning
§Behavior Therapy often uses classical conditioning to treat
phobias, anxiety, and neuroses associated with anxiety, e.g., test anxiety
§Behavior Modification usually uses principles of instrumental
conditioning to either eliminate undesirable behaviors or strengthen
desirable ones
Behavior Therapy and Classical Conditioning
Counterconditioning
and systematic desensitization
§Test anxiety example
§Try to associate the old CS (test situation) with a
new response that’s incompatible with fear so that new CR will displace fear.
Usually new CR is muscular relaxation. S’s trained to relax on command, then an
anxiety hierarchy is developed and gradually associated with relaxation
Avoidance Learning and Behavior Therapy
§Avoidance learning is a phenomenon therapists often
need to treat
§Associated with the persistence of many phobias, and
anxiety disorders
Avoidance Learning and Two Factor Theory
§Avoidance learning usually understood to result from
a combination of two factors:
§classical conditioning
§instrumental conditioning
§Begin with analysis of a simpler form of learning:
Escape learning
Apparatus Used to Study Escape and Avoidance Learning
Escape Learning
§Animal quickly learns to escape shock by quickly
jumping over the barrier after being shocked, thus terminating the shock
§What kind of conditioning is this?
Avoidance Learning
§When light or sound signal precedes shock, animal
learns to jump as soon as the signal occurs, thus entirely avoiding any shock
§This avoidance behavior is extremely
resistant to extinction
§How is it that even though animal is never
subsequently shocked it still continues to jump over the barrier when signal
appears?
Mowrer’s Two Factor Theory
§Asserts that both classical and instrumental
conditioning are at work
§How?
§Light (CS) -> Fear (CER) (classical conditioning)
§Reduction of Fear is Rf
for Jumping (instrumental conditioning)
§How could persistence of phobias be treated?
Behavior Modification and Instrumental Conditioning
§Eliminating undesirable behaviors: Find out what’s
reinforcing the behavior and eliminate the reinforcement (or perhaps satiate
individual with Rf so it’s no longer reinforcing)
§Strengthen desirable behaviors: Try to find reinforcers that will influence behavior and make Rf’s contingent on desired behavior (often with ‘token
economies’)
Cognitive Approaches to Learning
§According to behaviorist school of psychology,
learning is best described as a change in what organisms do, i.e., their
behavior
§Cognitive theorists want to describe learning as a
change in what an organism knows, i.e., cognitions
§Cognitive theorists point to inadequacies in the
behaviorist account of learning and try to go beyond what behaviorists can
successfully describe with their account of learning
Philosophical Influences
§Note the influence of philosophical presuppositions
in behaviorist (at least Skinner’s) approach
§Internal states, drives, knowledge, etc. are not
directly observable, hence we can’t really know about them or talk about them
convincingly
§Thus, a truly scientific account must rely only on
directly observable factors such as behavior (Positivism)
§Cognitive theorists generally reject this philosophy
of science and argue that although an adequate scientific theory must be tied
to observable events, a good theory may often make use of hypothetical (not
directly observable) structures and mechanisms that must be inferred to account
for observable behavior
Köhler’s Critique
§One of the earliest criticisms of behavior theory
came from Köhler’s work with chimpanzees
§He argued that Thorndike ‘loaded the dice’ in favor
of trial-and-error learning: There was no other way to solve the problem
§Köhler did a classic series of experiments showing
that chimps can create solutions to problems, e.g., build tools for
reaching food
§He denied that behavior theory could account for
this kind of learning
Köhler’s Experiments
§Kohler's basic experiment was to place a chimp in an
enclosed play area. Somewhere out of reach he placed a prize, such as a bunch
of bananas.
§To get the bananas, the chimp would have to use an
object as a tool. The objects in the play area included sticks of different
lengths and wooden boxes. He discovered that chimpanzees were very good at
using tools. They used sticks as rakes to pull in bananas places out of reach.
§Also learned to use boxes as step ladders, dragging
them under the hung bananas and even stacking several boxes on top of one
another.
§Kohler placed a bunch of bananas outside Sultan's
cage and two bamboo sticks inside the cage. However, neither of the sticks was
long enough to reach the bananas. Sultan pushed the thinner stick into the
hollow of the thicker one, and created a stick long enough to pull in the bananas.
Köhler’s Chimps
Köhler and Insight Learning
§Köhler described this kind of learning as ‘insight
learning,’ i.e., gaining insight into the relevant relations among elements of
a problem rather than acquiring a set of S-R bonds
§He argued that what is acquired is a new set of internal
representations of the situation, i.e., knowledge
§He also pointed to transfer of learning
to new situations
Tolman’s Work
§Evidence for a cognitive interpretation of learning,
i.e., that knowledge is what is acquired
§Organisms acquire ‘cognitive maps’
§Latent learning, i.e., learning without any obvious reinforcement
§Rats learned maze just exploring without Rf
§Learning without active responding
§Rats ferried around maze in wire basket learned as
well as those actively running the maze (McNamara, Long & Wike, 1956)
§Evidence for animals developing expectancies
about instrumental contingencies
§Rats trained with large rewards then switched to
smaller reward perform more poorly than those trained all along with smaller
reward. Contrast indicates animals developed expectancies.