Blending efficiency and resilience – Articles by Mark Ridley

At a basic level, resilience is the ability for something to withstand disruption. Efficiency is the quest for optimal output with minimum waste. Resilience and efficiency are not in direct opposition, however focusing too much on one can have a negative impact on the other.

For much of my career I’ve been tasked with designing ‘resilience’ into systems. This often meant identifying single points of failure — a hard disk, a server, a network link, a building or even a member of staff — and ensuring that the business could still deliver value if something failed or was suddenly absent.

For at least as many years I’ve been asked to deliver efficiency. Sometimes this was just making sensible procurement decisions and cutting costs appropriately, but in later years I spent more time looking at highly efficient systems, and because I work in technology these tended to be efficient, complex systems; the kind of systems that true ‘lean’ thinking excels at improving.

Efficiency and resilience are two important considerations for any business with long term aspirations, and often there is a direct compromise to be made between the two. Sometimes they can work in opposition to each other, but occasionally they can be combined for a best of both worlds scenario.

The trade between efficiency and resilience

Let’s go back to the beginning of my career. In the old distant days of the early noughties if I was building hardware to support a business, I would use every trick in the book to ensure resilience through redundancy (a technical term which usually indicates that a system has a backup in the event of a failure). Not only did I have many, many hard drives, but many physical servers. I might also have several data-centres (the physical building that servers live in). In some cases, those data-centres might even be in different countries.

That wasn’t cheap. In extreme examples, having the capability to keep a business running even after a disaster would cost more than double what it would cost to run if we didn’t plan for the worst (because we not only had two of everything, but also had to join them up). That’s not efficient. In fact, it’s extremely wasteful, and hedges against an extremely unlikely threat to the business.

In one business we considered the costs of providing this traditional type of resilience against disasters and decided to run without a backup at all. Instead, we’d save a substantial amount of money and reduce complexity which we balanced against a clear business continuity plan and an awareness of the risk to the business. The compromise was thoughtful and considered.

Lean in action

Toyota is an organisation revered for its place in developing Lean thinking. Toyota’s staff are trained to always think about improving their processes, and Toyota is largely responsible for bringing the concept of ‘Just in Time’ production to manufacturing. This concept (sometimes shortened to JIT), reduces many different types of cost to a business by ensuring that value-creating processes are closely linked. The assembly stage of a car requires many sub-assemblies and components. JIT manufacturing requires that those dependencies, the components manufactured elsewhere, are produced as close to the point of need as possible. This means that Toyota’s inventory systems send ‘pull’ requests to their suppliers, ensuring that they receive a steady flow of parts, minimising the cost of holding a complete inventory. This makes for a much more efficient overall business*.

But, what happens if the supply chain is impacted? Well, we can now see the effect for real, thanks to the UK government.

“We do not just have the 50 trucks, we have to have them in sequence, it is no good if we have 49 trucks and truck 17 is missing,” he said. “[Production] will then stop. So without the withdrawal agreement and withdrawing with a no deal, we would have stop-start production for weeks, possibly months. It would be very, very difficult for us to cope with.

– Tony Walker, deputy managing director, Toyota UK

Toyota’s factory at Burnaston, near Derby in the UK, produces nearly 150,000 cars per year (with over 90% being exported to the EU). Parts for these cars arrive in trucks that are carefully managed and critical to the production process. Any disruption to the delivery of parts in trucks which arrive every 30–40 minutes will immediately and potentially disastrously affect the production lines.

Burnaston does not keep a meaningful stock of parts and so a disruption to the supply chain will result in the production line being stopped. Stopping the line is expensive, with the plant producing cars worth about £12m every day and employing thousands of workers.

This is an extreme case — Toyota has designed their factories to be uniquely efficient — but this highlights the fragility that can exist in systems designed for radical efficiency. For certain this is intentional design by Toyota and they can adapt their process to deal with disruption. However, in doing so, they must tolerate lower efficiency and that cost must invariably be passed back to the customer.

Efficiency in digital businesses

Perhaps it seems like this balance is more severe in traditional, physical industries? Let’s look at process engineering for digital businesses to understand how it might apply elsewhere.

There is an increasing concern these days with automation, especially with ‘business process automation’ (BPA) or ‘robotic process automation’ (RPA). These are great ways of reducing cost and improving quality in highly reproducible settings. Machines excel at very specific, rote tasks — something we have now exploited for centuries. Both BPA and RPA are suitable for automating non-mechanical tasks, like data entry or administrative tasks.

The risk with process automation, especially in young digital businesses, is often that processes are poorly defined. These businesses rely on humans to make processes work, and those people are well evolved to deal with chaotic environments.

The first step towards automation, and therefore efficiency improvement, is understanding the current process (as-is), and laying out a target state for the process (to-be). It’s somewhat pointless to consider automation without first considering the value of the process itself.

When considering the automation of tasks which are primarily undertaken by people, it’s important to understand what wonderful and magically resilient machines humans are. In a highly volatile business like a startup, there is a premium for having smart people at hand to make things happen. Work is creative, and processes still nascent. Even though the promise of efficiency might be high, consideration should be given to the resilience of the overall business. By replacing humans with more efficient automation we may lose some of our resilience in the face of change.

Automating, with all its efficiency gains, might require someone with sophisticated knowledge — an administrator or technically competent user — to alter the automation tools in the event that a process changes. Suddenly, the newly automated process would become more brittle and fragile than the old, manual process when people were solely responsible.

In addition, automation requires tools. Whether these tools are physical, like the robots Toyota uses in its factories, or software, like CRM software or phone diallers, they will require support and operational care. Does the tool need an internet connection? What happens if there isn’t one? Do you have a backup? Can you replicate the process on paper?

Many older, more traditional businesses are by many measures more resilient than their newer, digital counterparts. A family restaurant, a small mechanic’s garage, a hairdresser or grocer may well be more able to deal with supply chain disruption better than a larger, modern, efficient business precisely because they are more inefficient and more reliant on adaptable humans.

Imagine what happens to those trendy, cardless coffee shops if their WiFi breaks. No coffee? That’s a future to bleak to contemplate.

When efficiency and resilience aren’t at odds

Efficiency is not fragility, nor is resilience wasteful. Rather, these are design choices that we need to be aware of when designing processes and businesses.

In the old days when I was building technology I needed to buy servers, and worry about how many hard drives they had. A wonderful example of how efficiency and resilience can work together is in the modern approach to building internet-grade systems, sometimes known as DevOps.

DevOps is a particular set of skills that often uses cloud technology like Amazon’s AWS and Microsoft’s Azure web services. These modern ways of building technology allow considered thinking to build beautiful solutions that at once manage to be more resilient and more efficient than I could have dreamed possible just a decade ago.

The core to the DevOps approach is to make architecture flexible, atomic and available on demand, as customers need it. Where in the past I needed to buy servers that would suit my needs three years into the future (requiring a scrying stone, some toe bones and a pot of tea), the availability of these services from Amazon, Google and Microsoft allows me to consume just a fraction of a server that I share with other people. In addition, I can replicate those fractional servers around the world, making sure that even if I lose a server, or an entire geographic region, I can keep my customers happy. I can even reduce these servers to lines of code which will enable me to move them between Amazon and Microsoft, in the event that the worst happens and an entire provider goes dark.

Because all of these services are shared by thousands of customers, it’s also cheaper and more scaleable than buying the servers of a decade ago.

What is significant is that it has taken both sufficient technological advancement and considered design to achieve this combination of resilience and efficiency. The process isn’t just capable of withstanding disruption, it is designed to be elastic and flexible delivering many other benefits. This is as close as technology gets to Nassim Nicholas Taleb’s Anti-Fragile.

Designing for efficiency and resilience

When you design (or inherit) processes at your business, you’ll no doubt want to design for both efficiency and resilience. When doing so, give consideration to the overall value that these processes create, and how necessary it is to embed these characteristics. In highly volatile, creative or adaptable organisations, be cautious that you don’t overemphasise efficiency. It’s possible that you’ll actually be making things more difficult for the business.

As processes mature and become well understood, think carefully about how repeatable they are. Good design is invariably more valuable than a narrow focus on optimising for efficiency, and often simply stopping a process from happening is the most efficient outcome.

At the end of the day, everything is a compromise. It’s better to give some thought to the overall value being produced before you rush to optimise it.

Fun fact: A fascinating fact courtesy of Duolingo and backed up by Swedish friends; there is no distinction between efficiency and effectiveness in Swedish (effektivitet means both). I guess, to a Swede it just doesn’t make sense for something to be efficient without being useful. That’s a pub conversation for techies, right there.

* An interesting note — sometimes Lean processes like Just in Time don’t result in a single process being more efficient — in fact frequent swapping between toolings (a Toyota practice) is less efficient than swapping less regularly when observed at a single process level (a local optima). However, Toyota are optimising for the whole business, or “optimising for global rather than local maxima”.