Risky Equations
This is my second post detailing my exploration of risk and risk management. I’m still primarily focused on the paper from a DRAGOS researcher, Jason Christopher. The paper can be found here. In this post I’m going tackle risk equations because I think they are very useful (and can be misleading). I want to make it clear that I am analyzing this report from Dragos and maybe even being critical of it, but I think it is outstanding work! Ripping apart these topics and starting a dialogue is the way to advance understanding.
In this post I want to work through risk equations. I find the concept of a risk equation super beneficial. However, if something is written in an equation form it should have equation meaning. This definitely comes from my engineering background and I think I look at these in a more literal fashion than many people. Is that good or bad? I don’t know, but it is something to keep in mind as we look at this topic.
In the course I created for our new Cyber Science major we used \(ISC^2\)’s risk equation:
\[Risk = Threat * Vulnerability\]I chose to use it because \(ISC^2\) is an authority for such things in the industry and it is easy to understand, perfect for an introductory course. \(ISC^2\) defines the terms in its equation as:
- Threat - Any natural or man-made circumstance that could have an adverse impact on an organizational asset
- Vulnerability - The absence or weakness of a safeguard in an asset that makes a threat more likely to occur, or likely to occur more frequently
- Asset - An asset is a resource, process, product, or system that has some value to an organization and must, therefore, be protected
As you can tell from the term definitions, some of the things that appear to be missing in the equation are embedded in the terms. Impact was the one that jumped out at me while I was teaching, but it is at least mentioned in the threat definition. It just doesn’t explicitly capture the severity of the impact. The other term that appeared to be missing is the probability of such a threat occurring, but that is captured in the vulnerability term. \(ISC^2\) also has a total risk equation that accounts for impact a little more explicitly:
\[Total Risk = Threats \cdot Vulnerability \cdot AV\]Where AV is the asset value. This equation does a pretty good job of capturing everything you need. If you put a control in place that lowers the impact or probability of an incident, then you will lower the threat or vulnerability term of the equation. The math approach I use to analyze these equations is happy that a lower risk will be produced. However, the affect of a control put in place is hidden. You could also argue that I made a stretch to really include it correctly. Also, what about resilience? Would resiliency built into the system lower threat and vulnerability? Let’s explore a few other equations.
The Dragos report has a good treatment of risk equations. You may remember the Dragos report defines “industrial cyber risk” as:
“The potential loss of life, injury, damaged assets, financial loss, and other harm from the failure or mis-operation of digital technologies and communication networks used for operational capabilities.”
I talked about this definition in a good amount of detail in the previous post, but I’m restating it here so you can see it with the risk equation. Which should match, right? Dragos states a “more OT-centric risk equation” is:
\[Cyber Risk = Consequence \cdot Threat\cdot Vulnerability\]This equation is fairly similar to \(ISC^2\)’s, but impact is clearly a part of the calculus (Dragos calls it consequence). I like this addition to the equation because it makes it significantly more clear. Dragos added the consequence term to make the equation more ICS appropriate, but I think this applies to IT too. How it applies is different because of things like process hazard analysis (PHA) and failure mode and effects analysis (FMEA) that exist in ICS and not IT, as they correctly articulate. The important thing here is that if a control is added to a network that lowers the impact of an incident you can clearly tell which term of the equation is reduced to lower the total risk. If you lower the probability of an incident occurring it still isn’t as clear though. That is still less than ideal.
The Dragos report does something interesting next. It doesn’t address probability (like I wished it did), it switched to discuss disaster recovery. This is interesting to me and something I don’t know much about, but using this method you can write a disaster risk equation as:
\[Disaster Risk = Hazard \cdot Exposure \cdot \frac{Vulnerability}{Capacity}\]This is super interesting! Capacity or “capacity to cope” is sometime referred to as manageability and adds in the idea of resiliency. This doesn’t solve my problem with probability, but I really like this addition/idea. The Dragos author like this concept too and altered their ICS risk equation to:
\[Industrial Cyber Risk = Consequence \cdot \frac{(Threat \cdot Vulnerability)}{Resilience}\]This is a good equation and does not need to be ICS focused only. I like these ideas and as I think through the math and what the correct equation should be I’m going to borrow from all of them heavily. The two problems I have with this equation is that probability is not clearly articulated and resilience is mitigating threat, opposed to consequence.
I will attempt to develop what I think is a better risk equation later, but first I want to dive into “Threat”. Threats are important and tracking them is critical to breaking kill chains and ultimately thwart cyber attacks. There is no doubt in my mind that they should be a part of cybersecurity programs. However, should they explicitly be a part of the risk equation? I don’t think so, but first we have to standardize on the term. As mentioned earlier \(ISC^2\) defines threat as:
- Threat - Any natural or man-made circumstance that could have an adverse impact on an organizational asset
I don’t really see a problem with this definition, but is it a mulitipicand? When Dragos talks about threats they talk more about their Threat Activity Groups. Their report speaks specifically about Xenotime and shows a threat report for that actor. These threat reports are a gold mine for a defender, but they don’t add anything quantifiable for the risk equation. I think it makes the most sense to stick with \(ISC^2\)’s definition of threat and save the more detailed threat reports for helping to determine appropriate controls later in the risk management process. I alluded to this earlier, but I don’t think threat belongs in equation itself. It isn’t a quantifiable term, so it prevents quantitative analysis. Instead I think a risk equation should be written for each threat starting like this:
\[Cyber Risk_{Threat} = ...\]The threat can then be used as the thing you are quantifying the risk for and it can also affect the probability of an event occurring. I think a more practical cyber risk equation would look something like:
\[Cyber Risk_{Threat1} = Probability \cdot \frac{(Vulnerability \cdot Impact)}{Resilience}\]This is probably not perfect and I’m going to keep reading and learning, but I like this as a start. Every term in this equation is quantifiable. It may be something you can measure directly, from past incidents, or via some sort of monte carlo type simulation. This makes it possible to actually do quantitative risk management and not just use high, medium, or low (or some variation of that). A control can lower the probability of an event or its impact. If it remediates that the vulnerability, then that risk correctly goes to zero. If you design some resilience in the system to withstand an event, like offline back ups for ransomware, then you lower the risk via the division. The risk correctly doesn’t go to zero though.
I’ll keep working on this concept and sharing my thoughts. Please reach out and let me know what you think!