Risky Business

8 min read

March 5, 2022

Lately, everyone seems to want to talk about technology risk. I am not just referring to cybersecurity risks, I am referring to the operational types of risks (those outside your information security program) that could cause a sizable impact to your business. As a CIO for over 20 years, I want to share what keeps me up at night - and how internal technology risk management programs may be improved.

For this discussion I'd like to set aside cybersecurity. Yes, it's a key concern -- but in most cases, it is well-funded and senior management is well aware of the risks. On the operational side, however, many companies address technology risks indirectly: via controls. Controls are certainly important, and when the Sarbanes-Oxley Act was passed in 2002, it was a major improvement in the way companies viewed technology controls (at least related to financial systems).

With all the new SOX controls in place did technology risk decrease overall? It may have slightly, but I didn't notice it in practice. Controls are very beneficial at reducing some types of technology risk, but they may not be directly addressing some of your biggest risks.

Controls indirectly reduce technology risk

Controls are part of how we think of technology “management” and/or “governance”. Good governance, and good management, are essential. However, control & management frameworks such as COBIT¹, or service management frameworks such as ITIL², don’t reduce all technology risks. Instead, they are focused on “management of the technology function”, not directly on reducing risk.

For example, COBIT’s design is based on the following principles:

Principle 1: Meeting Stakeholder Needs
Principle 2: Covering the Enterprise End-to-End
Principle 3: Applying a Single Integrated Framework
Principle 4: Enabling a Holistic Approach
Principle 5: Separating Governance from Management

Do these principles speak directly to managing technology risk?

ITIL may do a better job addressing traditional technology risks because a large portion is focused on service delivery, which includes reliability and availability. However, ITIL doesn't address overall complexity in your technology environment, overall technology investment levels, sourcing strategy, culture, etc., all of which impact overall technology risk.

Controls and control frameworks tend to focus on tangible, measurable activities. For example, you can pass an audit by proving that you have a well-documented (and followed) software development process, even if the process is not very good at delivering what the business needs. The software development process is also a magnet for controls, a convenient framework for regulators, and risk and internal audit personnel to attach an ever-increasing burden of business and technology controls. This inevitably slows software development to a crawl. A burden that digital companies had to shed to speed up delivery -- along came Agile development.

It is important to point out that controls add cost but do not automatically reduce risk. Some technology controls add cost, or time, without an offsetting benefit. Sometimes controls are ineffective, poorly designed, or implemented poorly -- adding delay, cost, and taking away from higher value work.

When I think of technology risk, and what keeps me up at night, I don’t immediately jump to weak “controls”.

What directly drives technology risk?

These are the things that keep me up at night. These areas of technology risk (outside of cyber risk) that I believe drive your largest technology risks. I think in terms of broad areas of technology risk that could manifest with direct impact to your bottom line.

Culture: Companies with a strong culture of accountability and ownership experience less technology risk. Yes, that's obvious, but how do you measure and foster that type of culture? Culture is so multi-faceted that it requires whole discussion of its own, however in this context we are mainly speaking to accountability. Those people in an organization that see their mission as keeping a specific system (or set of technologies) operational and secure at all times. Can you name the "system owners" of all your key systems? When you name them would they agree with you? Do they have the support and resources they require? Do you have a culture of clear accountibility? If not, your systems will be less reliable and secure - simple as that.
Complexity: Imagine having a system comprised of one single component that works 99% of the time. System availability = 99%. Now imagine the system is made up of 100 components, all with 99% reliability. If we raise 0.99 to the hundredth power (0.99 x 0.99 x 0.99 x 0.99 …) we end up with .366, or just 37% overall availability. If we think about resilient systems, the most resilient systems are the ones with the least number of moving parts. Complexity is also the enemy of security. Hackers “attack the seams.” A properly managed, patched, and configured server is very difficult to attack. Professional hackers look for the seams -- the places where people and processes are imperfect, and something is left exposed. The more complex your infrastructure and systems, the more likely it is to have "seams" with exposed vulnerabilities. Anti-complexity is the main driver of security, reliability, and availability. How do we measure our complexity? Is it increasing or decreasing?
Black Swans: Nearly all major outages I have seen in my career have been due to some “unique confluence of factors”. The post incident review always says something to the effect of “we never thought about, or anticipated, this ever occurring”. AWS "Post-Event Summaries" are fun to read because they are masterful at implying that without ever directly stating it. Nearly all risk management programs focus on reducing the risks we know or have experienced before. Humans simply can't anticipate black swans very well. The main thing that reduces black swans is reducing complexity (above).
Change: Not “change control” but rather the overall amount and magnitude of change occurring in your technology environment in any given period. This can be measured by the number and magnitude of projects, plus the volume of items sent through the change management process. What is our “change vector”? Most day-to-day outages are "self-inflicted wounds". One company I am familiar with let many of their network engineers go during a re-organization. That meant no one was making any changes to the network, and for a while it was more stable than it had ever been. Have you ever heard the saying that technology changes at the speed of Moore’s law? Well, humans change at the speed of Darwin’s law. Can your people cope with all the change?
Zombie Projects: You know what really puts an organization at risk? The zombie projects that live in plain sight. Zombies are projects that, for any number of reasons, fail to fulfill their promise and yet keep shuffling along, sucking up resources without any real hope of having a meaningful impact on the company’s strategy or revenue prospects. Many zombies are top executive pet projects, and most companies reward systems carry strong penalties for failing to meet commitments, so naturally people hesitate to raise their hands and say: “Our project is one of those.” It just looks smarter to find ways to stay alive. I personally have seen single zombie projects cost between 10 and 40 million dollars. Projects, particularly related to innovation, should be funded using a "venture capital" model. Seed the project, and if it shows promise, continue subsequent "investments" in the project as it shows measurable benefits. If not, stop future funding.
Technology Supply Chain: Recently, during the pandemic companies couldn't even procure basic items like laptops and monitors, or even printer toner, due to supply chain issues. Suppliers are a key risk to your business. For example, in February 2022 Toyota stopped production in Japan, which accounts for about a third of its global production, after a key supplier was hit by a cyber-attack. Toyota is well-known for implementing Just-In-Time manufacturing -- parts that arrive from suppliers going straight to the production line rather than being stockpiled. Toyota is a highly efficient manufacturer, but one that is also more exposed to supply chain risk and disruption. Primary targets are getting very good at preventing direct cyber-attacks; suppliers are often overlooked as an attack vector. With more and more technology work being sourced from numerous suppliers any of these risks could be a major source of disruption:
- Third party service providers, from janitorial services to software engineering, with physical or virtual access to information systems, software code, or IP.
- Poor information security practices by any of your suppliers.
- Compromised software or hardware purchased from suppliers.
End-of-Life Technology: Companies are much better at introducing new technologies than retiring them. The cost of running unsupported technology is higher than you think. Companies that don’t pay attention to deployed technology reaching obsolescence face a higher number of security risks and vulnerabilities than companies that keep a close eye on the lifecycle of elements in their IT landscape. With end-of-life of technology, technology management must deal with challenges such as integration issues, limited functionality, low service levels, lack of available skills, and missing support from vendors. Often, I see internally supported software that uses outdated technology and programming languages, which mean the people supporting the system no longer have market-appropriate skills -- and you can’t hire for those skills. What is the count of systems that you retired in 2021?
Orphan technologies: Companies today have way more systems and software titles than they have technology employees. That means ensuring that the technology is being managed properly is a challenge. Does every piece of software or hardware in your environment have a clear owner? Does the owner have the time, focus and resources to maintain what they own? If not, they are orphans. Orphans expose you to technlogy risks. Did you pare down your orphan systems in 2021?
Business continuity: Strong business continuity programs reduce risk in obvious ways, but also in non-obvious ways. A great business continuity program can help insulate you from some of the risks listed here. Other non-obvious ways include the need to understand and catalog your most critical systems, understand how they operate, and consider what new threats exist (such as ransomware). The difficulty of validating your business continuity plan keeps increasing along with complexity. Sometimes you can't validate business continuity directly -- you must trust your vendors and partners. With SaaS, availability is specified in the contract -- if one of your critical systems is SaaS-based how do you really know how well the vendor has addressed business continuity?
Sourcing. Sourcing can save you money (reduced labor costs), or cost you money (longer time to deliver, errors, omissions, etc.). If your savings and costs are all internal eventually the right balance will be achieved. However, liability is an area that is difficult, if not impossible to outsource. If an error in your software exposes your organization to liability, then you must consider reputational damage as well. Cybersecurity reputational damage has decreased over time (some now see it as almost as a cost of doing business), but sourcing is seen purely as a management decision. See: Boeing's 737 Max Software Outsourced to $9-an-Hour Engineers.
Adequate funding: Organizations that do not view technology as strategic, an/or seek to minimize cost in the technology function will suffer less reliable technology and more technology risk. This is clear and unambiguous. However, the inverse is not as clear -- organizations that are adequately funded can also experience less reliable technology depending on how the money is spent. If it's not spent addressing the items above, then you probably will spend it on "realized risks".

Summary

If you want to reduce technology risk you must be able to measure and reason about the factors that actually drive your largest, critical technology risks. This is a good starting point to think about technology risks that may fall outside of control frameworks. Is your internal technology risk management function measuring these factors?

References

Footnotes

COBIT is the acronym for Control Objectives for Information and Related Technologies. In the United States, COBIT is the most commonly used framework for achieving compliance with the Sarbanes-Oxley Act. ↩
ITIL stands for "Information Technology Infrastructure Library". The acronym was first used in the 1980s by the British government's Central Computer and Telecommunications Agency (CCTA) when it documented dozens of best practices in IT service management and printed them for distribution. ↩

Sharing is Caring

Edit this page