Data centres, like every critical infrastructure environment, come with a host of risks that need to be well managed and mitigated for sites to stay functioning, sustainable and safe. The Uptime Institute’s 2019 Data Centre Survey found that 60% of respondents claimed their data centre’s outages could have been prevented with better management processes. If a range of potential outcomes were anticipated, through comprehensive planning procedures paired with sufficient guidance documentation, perhaps some of these outages would have been avoided. However, there are quite a few steps necessary to building up such effective risk management, the very first being accurate risk identification…
Security
While most facilities consider a security threat as someone breaking and entering, data centre security threats are far more often a product of DDOS and social engineering attacks. In September of this year (2020), Microsoft released a Digital Defence Report, concluding that ‘threat actors have rapidly increased in sophistication over the past year, using techniques that make them harder to spot and that threaten even the savviest targets’. It is a data centre’s responsibility to protect the precious data of their customers. It is therefore essential that the management team are aware of and identify the security risks ranging from sophisticated cyber-attacks, to staff being manipulated into handing over confidential information.
So, how do data centre management teams mitigate security risks? One solution involves providing adequate staff training that aims to educate a team in how best to avoid social engineering attacks. Regular training is necessary and it is important to provide informative documentation that the team may utilise as a resource across their careers. It is also important that you have built a communicative and cooperative team that you trust within your site. Another essential element in mitigating data centre security risks is keeping a site, its server and its systems as current as possible. Regular site audits to assess the infrastructure will be a big help in analysing where there may be weaknesses and where improvements can be made. Again, this information should be recorded in accessible documentation for management teams to refer back to.
With the rise of machine learning, smart data centres are becoming more secure than those only protected by a human defence force. AI can learn normal patterns and behaviours, therefore enabling it to detect anomalies which could signify cyber-attacks. As Detlef Spang points out in his article for Data Centre Dynamics, some smart data centres are also replacing CCTV with sophisticated cameras that follow individuals around a site and detect suspicious behaviour. Utilising these smart technologies, alongside ensuring adequate and accessible documentation to act as a helpful resource for management teams, is essential when protecting against security risks.
Fire and flooding
Natural disasters may not be a risk usually associated with data centres, but they can be some of the most damaging. Flooding is becoming increasingly more important to consider, especially within the era of a climate crisis and with extreme weather as a major consequence of global warming. Moisture can cause a lot of damage in a data centre site and avoiding the risk of flooding starts with planning before construction can even begin. A location needs to be reviewed, taking into account the likelihood of natural disasters in the area and the flood-risk due to ground level and nearby rivers. It is also possible to protect pre-built data centres from extreme rainfall by using water resistant cabling protection. Unfortunately, while climate change poses one of the biggest risks to critical infrastructure, only 14% of the 867 data centres surveyed in 2019 said they were taking this issue into consideration, while a meagre 11% were taking steps to mitigate flood risks.
Fires, whether caused by nature or otherwise, also need to be planned for due to the huge amount of electrical equipment on-site and the extreme damage that can be done. The usual precautions should be taken to ensure the least risk of fire devastation within a facility – smoke and fire detectors, fire walls, fire retardant cables and automated fire suppressant systems are all vital. It is also important for maintenance to be prioritised, as old and faulty systems can pose a risk. With regular site audits, updated O&M Manuals and strict procedures followed for maintaining assets, the threat of fire damage can be significantly reduced.
Equipment failure and loss of power
Perhaps the most commonplace and regularly anticipated threat in the data centre environment is equipment failure and/or loss of power. The existential risk of complete system failure and data centre outage could result in losing important data, a reduction in productivity and sizeable financial repercussions. It is important to consider the issues power loss can cause customers, as one outage could results in a loss of trust in the network provider that may take a huge amount of investment in marketing to win back. Good preparation, documentation and well-planned recovery processes should focus on minimising data centre downtime and recovering a sites power as quickly and efficiently as possible.
To reduce the risk of human error causing system failure, adequate training and accessible resources are again an essential. Training is imperative for the management team to full comprehend how to deal with an issue if it occurs and therefore how to reduce downtime across a site. HAZOPs and single point of failure analysis are elements to include within a data centre’s O&M Manual. This style of documentation should help a team to manage and mitigate the risks they encounter, especially within equipment failure and discovering the cause as well as the solution.
Our unique JB eDocuments incorporate cause and effect, single point of failure analysis, HAZOP information as well as manufacturers literature, an asset register and interactive site drawings. Start managing risks in your data centre more effectively than ever before.