OT's Journey Towards Automation & Automated Risk Metrics

OT's Journey Towards Automation & Automated Risk Metrics

26 Jul 2020

I came across an interesting blog post by Dale Peterson the other day titled, Automated Risk Metrics - The Next Battleground for ICS Security Products. Dale assesses that ICS security solutions should now focus on automated risk metrics as a key differentiator. This is the discussion we need to have as we head into the second half of 2020.

Automated risk metrics is a new arena in ICS security. But there’s more to it. The next generation of OT security solutions should fuse asset discovery with automated and operation context risk metrics. This offers way more than an inventory of assets; it provides a map of all OT, IT and IoT assets, giving each one a process view of the risk, based on operational and business impact.

Understand the Business Impact

Perhaps the real question to ask is “what is the business impact”? Can we create a single golden standard to determine an absolute risk score and apply it across every industry and OT environment? Obviously, we can’t. Risk tolerances differ per vertical, per economic cycle, per operation, per compliance standards and especially per specific (corporate) business impact and goals. The solution will need to be a combination of a basic set of metrics on top of which we would apply vertical-specific AND business-specific criteria.

The key is to approach each operational unit from bottom up; from the single asset, to the operational process, to the site and corporate level. The risk (per asset, process and site) is assessed based on several factors, including the asset’s attributes, vulnerabilities and behavior, relation to other assets and processes, network activity, security posture and segmentation analysis. To achieve this effectively, ICS security solutions must be capable of collecting data from all the data sources within the environment - OT, IT and IoT (check this link for additional reading).

All the above needs to be considered within the operational context: 

The process criticality in terms of operations, safety, costs and revenue. What is the potential loss due to a faulty asset? That impact can be on operational continuity of critical processes, lack of compliance to regulations (which may lead to fines), environmental (e.g. due to air emissions), and even human casualties.

Naturally, this dictates working closely with both the operational and the business teams in order to understand how critical every asset is to its respective process. This company-specific data is also factored into the risk model. In addition, we cannot look at each asset as an island, and just aggregate the risks. Unlike IT, in the live industrial network, where production lines are working 24x7, downtime is not an option. Mitigation steps should be relevant to the constraints of the industrial world, and so should the risk calculation. Risk related to vulnerabilities can be reduced in alternative ways, through proper segmentation.

Automation is a Must!

While we argue which risk scoring or risk calculation methodology fits best, one thing we can all agree on is that the execution of risk assessment must be automated. 

Current “manual” risk assessments - which in fact comprises technology, processes and people have three very big limitations:

  1. Risk assessments are only conducted periodically. So, while the threat map may change dramatically on a daily basis (introduction of new threats, device misconfiguration, or attack surface expansion), risk assessments only provide snapshots. 
  2. Secondly, due to ever limited resources, manual risk assessment processes tend to miss the “mechanics”, i.e. analyzing FW rules, whether each system is hardened properly, etc.. Very little attention is (or can be) given to “evidence”. 
  3. Lastly, scale. More and more OT and IoT devices are being introduced into operational environments. It’s simply impossible to match this growth with manpower. Manually checking and assessing risk of thousands of OT/IT/IoT parts and machinery is out of the question.

The only way to overcome these limitations is by automating the process. Just as IT moved from manual to automated processes over the past decade, OT is beginning its journey towards automation. The difference lies in OT's complexity and the repercussions of any one change or mishap.

ORM stands for Operational Resilience Management

Finally, it’s important to remember that risk assessment is just part of the solution. To really be able to bring value, i.e. ensure operational resilience, we must also provide clear and simple mitigation steps to reduce the risk to a “risk acceptance” level. Simplified “playbooks” must be made available. Mitigation steps need to be comprehendible not only to experienced cyber analysts but by production engineers as well. 

The combination of ongoing, automated risk assessment and simplified mitigation playbooks can significantly reduce risks; allow operational teams to understand and respond to risks in real-time and make way for ORM – operational resilience management.

Yair Attar
CTO & Co-Founder, OTORIO

For more information contact us at [email protected].

04 May 2020 Industrial Cyber-Security During COVID-19: From a Hackers’ Paradise to Resilient Remote Operations more...
26 Mar 2020 Coronavirus: Time for Remote Connection Solutions for ICS more...
18 Mar 2020 COVID-19 is a Wake-up Call for Manufacturing SMBs more...
loader
×

OTORIO website uses cookies. By continuing to browse the site you are agreeing to our use of cookies. For more details about cookies and how to manage them, see our cookie policy.

Continue