How we learn #1: Operant conditioning.
A general introduction to the learning series.
Learning is something that is greatly advantageous for the survival of both humans and other animals. By learning behaviours that produce favourable outcomes, we can aid our well-being and repeat such behaviours to acquire the benefits more than once. It is described as:
“An adaptive process in which the tendency to perform a particular behaviour is changed by experience. As conditions change, we learn new behaviours and eliminate old ones.”
Martin, Carlson and Buskist, 2007.
There have been a great deal of different learning theories proposed; the main three being habituation, classical conditioning and operant conditioning. Habituation is deemed the simplest form of learning; when we learn to ignore a repeated stimuli. For example, prior to reading this sentence, you will have learnt to ignore the feeling of clothes against your skin. Of course, as soon as you read that, you are instantly aware of the feeling your clothes are producing (unless you’re reading this naked, which I’d rather not assume…). Classical conditioning is learning to associate two stimuli with one another. For example, your mouth might water in response to the scent of a cooking steak (or even the thought of one!). Both these learning theories will be explored in greater detail in subsequent learning posts.
It may seem wiser to begin my learning posts with habituation, as it is the most basic form of learning. This is true, but I believe operant conditioning will be of more use to students and will prove to be more popular than a habituation post. There is much more to discuss, and it will (in my opinion) prove much more interesting to read.
So – operant conditioning. What is it?
Operant conditioning, also known as instrumental learning, is learning by ‘operating’ with the environment. By doing so, we will sometimes be rewarded with good consequences and other times we will produce less favoured outcomes. We are more likely to repeat a behaviour once we learn that it produces positive consequences.
Thorndike’s Law of Effect
Operant conditioning originates from the basement of an American psychologist by the name of Edward Thorndike. He observed the behaviour of hungry cats which he placed into homemade ‘puzzle boxes’. The box was designed so that once a latch was activated, the cat was able to escape and find food. As expected, the cats firstly engaged in random behaviour, such as meowing, hissing and scratching. At first, cats took a long time to escape; they would accidentally nudge the latch which opened the puzzle box. This was a random, unintentional interaction with their environment, leading to a rewarding consequence. In each subsequent trial, the cat became more and more efficient at activating the latch – until eventually it was activated with little or no hesitation. Thorndike described this as “learning by trial and accidental success“.
Thorndike explained that the cat repeated the latch activation behaviour as a result of it’s favourable outcome. There was no other way that the cat could escape and eat – and so the behaviour was learnt. When the cat escapes, the response that leads to this behaviour is strengthened. He named this relation between response and outcome the “law of effect“. Thorndike’s work was possibly the most important advancement in learning theory – as it enabled us to understand learning as a response to the environment rather than just other stimuli like with classical conditioning. His work enabled another prominent behavioural psychologist to advance the theory further: that of B. F. Skinner.
Skinner’s contribution to understanding operant conditioning
Burrhus Frederic Skinner was no doubt a very busy man. He took the findings of Thorndike and found objective ways to study behaviour, writing several books to the public suggesting ways the theory might be used to better society (Skinner, 1948).
Perhaps his most important contribution to the field was his invention known as the “Skinner box” or “operant chamber”. This allowed the behaviour of animals to be observed and manipulated in an easy manner. One such variation of his invention, for example, required rats to press a small lever. This in turn lead a pellet to be dispensed by the box. An example of a typical Skinner box used for rats can be seen in figure 1 below. With humans, they are often specially tailored to the needs of the experiment, but involve allocation of “points” which can often be exchanged for money or other goods.
As before, it is found that animals will repeat behaviours which lead to favoured outcomes – in the case of the box above, a pellet being dispensed. Many variations are used, with signal lights, electric shock generators amongst other things.
Skinner’s work lead him to come up with something called the “Three Term Contingency”. He believed all human behaviours can be broken up into three separate parts:
- Discriminative stimulus
- Operant response
The discriminative stimulus is, essentially, the preceding event that occurs before a certain behaviour is acted out. So, for example, a car alarm sounding outside could be a discriminative stimulus. You would discriminate between your car alarm, and that of other vehicles. Should you realise that it is your car, you would have an operant response. This is simply the behaviour that occurs due to the discriminative stimulus – which would be to check your car isn’t being stolen (and to turn the alarm off if not). Reinforcement would come from the relief that your car is safe, and that you now don’t have to listen to the annoying alarm for ages. Our everyday behaviour is guided by discriminative stimuli; responding to our name being called for example. Have you ever got up and replied “yes?” when nobody has actually called your name (without having misheard)? Of course not – why would you respond when nobody has called you? We only reply (the operant response) when our name has been called by someone (the discriminative stimulus) because that way we will probably end up talking to someone we like (the reinforcer).
More about reinforcement and punishment
When studying behaviour in more recent times, researchers tend to manipulate one of the three parts of the three-term contingency. Of the three parts mentioned before, the reinforcement/outcome is the most manipulated variable. There are typically five consequences that occur as a result of operant behaviour:
- Positive reinforcement: This is when behaviour is repeated due a desirable outcome (appetitive stimulus) reliably and frequently following the response. For example, you may visit a shop again (the response) because the service was excellent, or the clothes are good (the appetitive stimulus).
- Negative reinforcement: This is often incorrectly mixed up with punishment. They are two completely separate concepts! Negative reinforcement is when behaviour is repeated in order to avoid an aversive stimulus (which is anything unpleasant/undesirable). For example, getting stuck in traffic is aversive for most of us. If you leave home early one day and miss the traffic, you will repeat the behaviour of leaving early to avoid the aversive stimulus (heavy traffic). Negative reinforcement leads to strengthening of behaviour.
- Punishment: Again, this is often mistaken for negative reinforcement. Punishment refers to avoiding particular behaviour that is often followed with aversive stimuli. For example, if you receive a painful sting from a bee when sticking your finger in a hive. You would consequently not repeat the behaviour of prodding a beehive. People often punish their children or pets to prevent unwanted behaviour. The problem with this, however, is that the person/pet does not learn which behaviours are desirable – only those which are not. So, punishment leads to weakening of a behaviour.
- Response cost: This is a form of punishment – whereby you avoid behaviour that leads to the termination of an appetitive stimulus. For example, if someone tells you that every time you swear, you must pay £1. You will avoid swearing in order to keep your money. This is commonly used with children with behavioural problems – points or tokens are removed every time they misbehave.
- Extinction: This refers to the reduction of a certain behaviour because the reinforcer no longer occurs. The rat that pulls the lever for a pellet of food will no longer do so if the pellet will no longer dispense. Another example in humans is story telling. Once you realise people are bored of your story, you will stop telling it. The reinforcement of seeing people enjoying your story is gone – there’s no reason for you to continue with the behaviour.
And finally, a note on shaping.
Skinner developed a technique he named “shaping”. This is a method of teaching behaviour to a subject (usually animals). The concept is simple enough; you positively reinforce desirable behaviours when the subject engages in them. Consider training a dog to give their paws for a treat. At first, the dog even coming towards you will be rewarded with a tasty treat. Then, you only give the reward once the dog sits down. This begins to shape their behaviour towards the desired eventuality. You will then only provide treats once the dog offers a paw.
This works with humans as well. At first, you might give a sticker to a child who forms poor, but correct, letters of the alphabet. Consequently, you only provide a sticker when better formed letters are produced. You could even shape my behaviour! Posts on here which receive more views are going to influence me to write similar topics/posts of a similar standard. If a post is ignored, I will probably avoid touching on that topic again or writing in that style.
Those are the fundamental principles which outline operant conditioning. There is much more reading available, which is easily accessible in libraries or the internet. I realise this post is very long at the moment though, so I will refrain from writing anymore. Thanks for reading, and be sure to look out for the other two posts soon to be released on learning!