The Wrong Continuum

It's probably fair to say that, when you start to study equine behaviour, one of the first things you learn is learning theory and, in particular, operant conditioning. To some people it appears delightfully straightforward; to others it is an alien science which seems far removed from our equine friends. Either way, it becomes a fairly crucial tool in our interactions with horses. Operant conditioning is employed every time we apply or release some form of pressure and, since the vast majority of instructions given to a horse by a human involve pressure, it is something we do all the time. Whether or not we are aware of it. Whether or not we like it.

Operant conditioning can be broken down into four "contingencies". We have reinforcement and punishment and each of these can be positive or negative. That's it. If a behaviour becomes more likely to reoccur in the future then we say it has been reinforced - positively if we have added something (e.g. given a reward) or negatively if we have taken something away (e.g. release of pressure). If a behaviour becomes less likely to reoccur in the future then we say it has been punished - positively if we have added something (e.g. a smack) or negatively if we have taken something away (e.g. stopped giving a scratch). Of course, all of these "somethings", otherwise known as "stimuli", need to be properly salient, or relevant, to the horse.

Unofficially, and somewhat emotionally, we all place these contingencies on a continuum of what we perceive to be more or less ethical. It all starts to get a bit confused. At one extreme we feel that hitting horses is pretty bad but "using a little tap of the whip to back up the leg" is perhaps acceptable. Negative punishment doesn't seem quite as bad as positive punishment. Negative reinforcement is normally considered the most practical compromise; we apply mild pressure (or sometimes quite considerable pressure) and the horse works to remove that pressure. And then we have positive reinforcement, seen by some as the "holy grail" of ethical training and by others as too impractical to implement sensibly.

So we have our continuum of bad to good. We feel good about our training when we can be up in the reinforcement end of the continuum and we feel bad about our training when we are down at the punishment end. And we have long arguments on internet forums when we think we are using milder punishment than other people who use much more aversive training. Are we being more ethical if we use positive reinforcement and is it always actually positive? Or could the need for strict stimulus control border on the aversive? What about when we combine positive reinforcement with negative reinforcement? Does that get us the "best of both worlds" - the ethical combined with the practical? These topics have all been the subject of my previous articles, all listed on http://www.equinemindandbody.co.uk/articles.html

But are we missing something? Did we ever stop to consider whether there is any more to behaviourism than just this combination of scientific contingency and emotional hierarchy? Is the science of behaviour complete? Or has the research continued? Unsurprisingly, the research has indeed continued, with far-reaching consequences - and the results have been somewhat surprising to anyone who has been raised on a diet of Skinner.

For a start, monkeys were found to perform tasks involving a mechanical puzzle for no reinforcement. The monkeys were clearly interested in the task for its own sake, an observation contrary to the stimulus-response pairing predicted by behaviourism, and the inclusion of food rewards disrupted their ability to solve the puzzle (Harlow 1953). Experiments into the motivation of humans found that giving monetary rewards actually reduces the likelihood of repeating the task in the absence of rewards (Deci 1995). In fact, in a wide array of studies of human behaviour from the fields of education, parenting and business, rewards actually turn out to be punishing, to rupture our relationships, to ignore the reasons behind behaviours, to discourage risk taking and to reduce interest in the task (Kohn 1993). Kohn also points out that most of the experiments leading to the results of behaviourism made use of semi-starved rats in under-enriched cages, or institutionalised adults in care or children who were already dependent on others to provide for them. If you are already dependent on reward systems to motivate you then you do indeed tend to continue to need rewards for your ongoing "motivation".

In fact, to quote career analyst Dan Pink in his entertaining and enlightening 2009 TED talk: "This has been replicated over and over and over again, for nearly 40 years. These contingent motivators - if you do this, then you get that - work in some circumstances. But for a lot of tasks, they actually either don't work or, often, they do harm. This is one of the most robust findings in social science, and also one of the most ignored."

In his book "Punished By Rewards" (I highly recommend this book to anyone who likes a detailed book with lots of references. Alternatively see the slightly lighter Unconditional Parenting instead. Preferably both.), Alfie Kohn makes a similar point - it's not that rewards categorically don't work. They do work, as all those of us who like to use clicker training (myself included) will agree. But perhaps it is a little sobering to look at the conditions under which rewards work best:

- they tend to benefit the person who decides what gets rewarded, rather than the recipient
- they tend to work only for as long as the rewards keep coming (hence many horses' resistance to the notion of "fading" the clicker/reward)
- they tend to work best for very mindless tasks with little or no intrinsic interest

I'm sorry to say that these conditions describe an awful lot of clicker training sessions I have witnessed. And sure, we can work to mitigate the negative effects of the training by keeping rewards low-key and unexpected (i.e. avoiding bribery and fixation on the rewards). But ultimately when we try to use reward-based training we have no guarantee that we are truly providing a positive experience for the animal. Even if we are providing tasty and desirable treats in the short-term, what is an intensive conditioning process doing to a horse's motivation, and to the relationship we have with our horses in the longer-term?

But can't we do better? What if we want to benefit our horses, maintain behaviours in the absence of rewards and not restrict ourselves to mindless tasks? And still not resort to aversive training? Is there anything else? Motivation expert, Ed Deci, states in his book "Why We Do What We Do" that intrinsic motivation is associated with a richer experience, better conceptual understanding, greater creativity and improved problem-solving, compared with extrinsic rewards. This sounds more like it - can we tap into that instead?

Happily, yes we can do better, there is something else and we can tap into intrinsic motivation instead. Deci highlights autonomy as one of our key basic needs. We need to feel that we have control over our lives. When we are being "trained", be it via positive or negative approaches, we feel controlled and lacking in motivation. We are no longer able to make choices. But if we can retain our autonomy we avoid those pitfalls. Is there any reason to assume that horses don't feel similarly?

Horses are typically domesticated and even those who are feral often have their lives encroached on by humans. We keep horses in stables, in small paddocks, alone or with companions they would not necessarily choose and ride them to suit our own wishes. How can we even begin to pretend they have their autonomy? But I am reminded of an interview with a survivor of the Russian Gulags who, when asked how frightening it must have been to be imprisoned, said that in the camps you were more free than when you were outside. Free to think and speak your own thoughts, rather than too frightened to express them as he had been previously. Autonomy is not restricted to our geographical location but is much more linked to how we feel about our situation. It gives us the freedom to be who we really are. We don't even need to experience autonomy all the time - in the wonderful book "Dibs: In Search of Self", therapist Virginia Axline describes the phenomenal and life-changing personal growth of a six year old boy in just an hour-long session of autonomous play therapy each week, despite his somewhat aversive home life.

What would autonomy in horse training look like? Is it an unavoidable oxymoron or can we do something that approaches it for the benefit of our horses? In order to support the autonomy of humans in a therapeutic manner, Deci advises us to take the clients perspective, acknowledge the others feelings, provide relevant information, give rationale for any suggestions or requests that we make, offer choice and control to the client and minimise the use of controlling language. With our horses we can translate this into doing our best to see all of our requests from the point of view of the horse and acknowledge that the horse may have a very different feeling about what, to us, might appear an easy task. We can't use language to provide the relevant information or rationale behind any requests but we can take time for the horse to really understand what we would like and use shaping to make any task non-aversive and as easy to achieve as possible. We can reduce the use of controlling training by allowing the horse to make choices and decisions and we can pay attention to the answers we are given. We can make use of occasional free-shaping sessions to give the horse the opportunity to choose to perform behaviours to elicit rewards, without losing sight of the controlling nature of reward-based training. The goal here is for the horse to be making decisions, rather than merely performing behaviours of our choosing. And couple all of this with meeting his ethological needs as best we can. And when we do all of this, it is remarkable how much easier it is to make requests of them.

So what is the right continuum? I would put all forms of punishment, reinforcement, control and extrinsic motivation at the "bad end" and choice, empowerment, autonomy and intrinsic motivation at the "good" end. And that is the case for myself, my children and my animals. We can't avoid the "bad end" entirely but by maximising the opportunities for autonomy, we can work towards returning some of that long-lost fundamental need to our animals.

References
Virginia Axline, 1964, "Dibs: In Search of Self"
Ed Deci, 1995, "Why We Do What We Do"
Harry Harlow, 1953, "Motivation as a factor in the acquisition of new responses." In "Current theory and research on motivation (p 24-49), Lincoln, NB: University of Nebraska Press
Alfie Kohn, 1993, "Punished By Rewards"