logo

The (Clicker Trained) Elephant In the Room: Fading and Variable Schedules

The popularity of equine clicker training has soared over recent years and it is wonderful to see so many people engaged in a more thoughtful and positive style of training. It is now relatively easy to find information regarding how to get started in clicker training, ideas about what to train and how to trouble-shoot when things go wrong. But what about the long term? Do we want to be rewarding every last target-touch for ever more? Do we want to be rewarding every little step of a behaviour all the time? Or do we need to find a way out so that, once a behaviour is established, we can consider it "done" and move on to something new?

Phasing out the rewards can be tricky. If horses have become accustomed to receiving a reward every time, they can be fairly resistant to the notion that sometimes treats don't happen. And it feels a bit sneaky, to break the behaviour-click-reward "agreement" that we have spent time establishing, manipulating the horse into giving us "something for nothing". It's hardly surprising that we elicit frustration behaviours when we try. In short, we need a way to teach clicker training so that the horse understands and trusts that treats are coming - but soon, rather than NOW.

CT is normally introduced via a continuous schedule of reinforcement, sometimes known more colloquially as the "one-click one-treat" (OCOT) approach. Every time the horse performs the correct behaviour, it is marked with a well-timed, simultaneous click followed by a primary reinforcer. Typically the reinforcer is a food reward, although it can equally be a scratch if the horse enjoys it and finds it genuinely motivating. The click is known as the "bridge", a secondary reinforcer that bridges between the behaviour and the reward. Classical conditioning ensures that a certain amount of the positive feeling associated with the treat also becomes associated with the click - the horse recognises that the click indicates "good things". And so, our initial target-touching session may look like:

touch/click-treat, touch/click-treat <repeat a few more times to a max of about 10> touch/click-treat, touch/click-jackpot

where the jackpot is ideally a few treats fed on the ground as a clear marker to indicate the end of session. The trainer can help to strengthen the message that the session is over by removing the treat-bag and leaving the training area.

In an ideal world we would establish the behaviour and then, within a few sessions, start to introduce a variable schedule of reinforcement. In his experiments, Skinner recognised that rewarding the behaviour randomly (variable rate), rather than every time, would lead to a stronger behaviour than if we retain a continuous schedule (fixed rate). We also see a similar effect outside the laboratory, with horses becoming a bit jaded with the repetition required.

reinforcement

Fig 1: Schedules of reinforcement (vertical marks indicate timing of rewards)

So why might this be the case? If we perform a behaviour and the pay-off is not forthcoming, we tend to escalate our behaviour. The classic example is the "broken vending machine" analogy, whereby when the coin doesn't work we may poke the slot, press any available button, shake the machine and ultimately kick it. If any of those strategies work then we may become very strongly attached to them, opting for an escalated version of the behaviour next time we use the machine. If we reach the peak of our escalation, a point known as an "extinction burst", and the machine still does not work then we give up in disgust and walk away. Occasionally we may try the vending machine again to see if it is still broken: this is spontaneous recovery, the outcome of which strongly dictates whether or not our attempts to use the machine extinguish fully or not. In correct use of a variable schedule of reinforcement, we are aiming to utilise that initial incremental escalation of the behaviour, allowing it to be rewarded and become ingrained.

extinction

Fig 2: Extinction Bursts and Spontaneous Recovery

In practice this may look like:

touch/click-treat, touch/click-treat, touch-pause-(oy stoopid human, I did it, which bit of touching didn't you understand?)-TOUCH/click-treat, touch/click-treat, touch/click-treat, touch-pause-(oh yeah, I remember this, it's ok, I do it again and still get the treat)-touch/click-treat, touch/click-treat, touch/click-jackpot

The random nature of the timing of the rewards is what strengthens the behaviour - in extreme cases it can lead to, for example, gambling addiction, where winning just often enough will maintain a very strong habit. The habit would fall apart if wins were never forthcoming and the behaviour allowed to extinguish - occasional and unpredictable wins will maintain the behaviour, but only if the stretching of the schedule is introduced gradually.

The horse should feel ok to repeat the behaviour without getting upset about not receiving a treat because it is an occasional "one-off" rather than persistent manipulative demands on the part of the human. In practice, this doesn't always happen and the horse can become frustrated.

An alternative approach to help shape and ease the process is to utilise "treatless clicks", a method pioneered by Ben Hart in his book "The Art and Science of Clicker Training for Horses". The key point here is to recognise that the click as a bridge is not the final piece of information that the horse receives to indicate that the treat is coming. After the click, a trainer's hand goes into the treat-bag - and horses are very perceptive of our body language. This piece of information is what enables us to decouple the click from that final behaviour and introduce a sense of patience and calmness into the process.

We now recognise that - from the horse's perspective - our initial training is actually:

touch/click-handgoesintotreatbag-treat

Our click is an earlier, or intermediate, bridge and the hand movement is the "normal" or terminal bridge. In this model, the click is providing information to say that the treat is coming, but if the hand movement is performed slowly, we can delay the time between the click and treat. Thus the horse never expects the treat immediately and patience can be built into the training from the start.

When we now try to introduce a variable schedule of reinforcement, our session above can be enacted as:

touch/click-handgoesintotreatbag-treat, touch/click-handgoesintotreatbag-treat, touch/click-pause-(oy stoopid human, I did it, which bit of touching didn't you understand?)-TOUCH/click-handgoesintotreatbag-treat, touch/click-handgoesintotreatbag-treat, touch/click-handgoesintotreatbag-treat, touch/click-pause-(oh yeah, I remember this, it's ok I do it again and still get the treat)-touch/click-handgoesintotreatbag-treat, touch/click-handgoesintotreatbag-treat, touch/click-handgoesintotreatbag-jackpot

The horse perceives this session as the same as normal, with the addition of extra treatless clicks which serve as reassuring information that a treat is still imminent.

methods

Fig. 3: Summary of Method Steps

With careful use of shaping this session can be stretched over time to:

touch/click-touch/click-touch/click-handgoesintotreatbag-treat, touch/click-handgoesintotreatbag-treat, touch/click-touch/click-touch/click-touch/click-handgoesintotreatbag-treat, touch/click-handgoesintotreatbag-treat

Maybe then to:

touch/click-touch-touch/click-handgoesintotreatbag-treat <and various random numbers of repetitions>

and then to:

touch-touch-touch-handgoesintotreatbag-treat <and various random numbers of repetitions>

and ultimately to:

touch-touch-touch as a one-off one day and then just rewarding it periodically.

But because the behaviour is so well-established, the horse doesn't feel that the reward is urgent. Similarly when we teach a foal to lift his/her legs, we would reward every little step along the way. Over time we shape the behaviour and reward in fewer stages, until ultimately a farrier might be able to trim the horse in our absence and give no reward at all. We might still incorporate treats sometimes to maintain positive associations but they aren't really needed.

These examples show how you might place the repetition of a behaviour onto a variable schedule of reinforcement and eventually fade the clicks and treats altogether. Equally it might be the duration of a behaviour that you want to extend instead, gradually using clicks to mark "yes that's great carry on, treat is coming soon", before the hand goes into the treat bag and indicates that the behaviour can end now.

The process of using "treatless clicks" is often confused with being a different method from OCOT and as such is sometimes regarded as controversial. But whichever method you think you are using, the horse will still recognise that the terminal bridge is the hand moving towards the treats. As a result the click becomes a "keep going signal" (KGS) or intermediate bridge (IB), whether you like it or not. It is then up to you whether or not you make use of that additional information that the clicks afford you.

methods

Table 1: Summary of Pros and Cons of Different Approaches

Finally, these processes are not set in stone and different horses will have different tolerances for how much behaviour we can extract from them in return for a treat. It is not about manipulating the horse to give us more for nothing, and we certainly risk the horse feeling that way if we are not extremely cautious. Equally, we don't want to be rewarding the horse for very basic tasks for evermore. The art of training comes in here, with the trainer "feeling" the right sort of balance of behaviour vs. reward for the individual horse. This is not easy to put into practice and we will make mistakes. We need those mistakes to be small so that the positive associations with reward-based training can compensate. If in doubt, stop.