Bridging IT & OT with the Environmental Data Agent

Bridging IT & OT with the Environmental Data Agent

With the ECO-Qube architecture, we are solving the missing link between IT and physical infrastructure. Today, each side - the data center facility with cooling systems & electrical infrastructure is optimized with the assumption of IT load being static. This can be witnessed by the ‘1’ in PUE, which assumes that every kW going to IT is static and can not be optimized, the only thing that can be optimized is the % of additional electricity used by the data center for cooling and providing redundancy. The same is true for the IT side. Due to virtualization, most IT applications are unaware of the physical infrastructure, and thus assume it is static (e.g. that the IT server is always on and that the cooling system is static).
Yet in terms of potential efficiency gains, there is plenty of opportunities. IT servers are rarely under full load, often operating at an average of 10-15% utilization rate. Most IT systems are over-provisioned or designed to capture peaks, which means they are underutilized most of the time. Combined with the common practice of turning off energy savings features, this created tremendous waste.
At the same time, cooling systems & data center infrastructure have an availability constraint imposed by the IT - it shall never be turned off. The only signal that the data center operations can optimize against is essentially the server or room temperature, which is not reported by the IT itself, but measured either in the room or near/around the server itself.
What is clearly missing between these two worlds - often referred to as the IT (information technology) and the OT (other technology) - is an interface. And with that interface, a way for the two sides to exchange their requirements, demand profiles and forecasts from their respective environments, enabling each side to perform optimizations to find an optimal balance in terms of energy use, other environmental impacts, availability and costs.

Creating the common meeting place

There are many constraints hindering the effective collaboration of these two environments (IT and OT).

The network problem

The first one is that IT and OT run on different networks, and for good reason. The IT is often connected internet-accessible, exposing it to cybersecurity risks which are part of the business of IT.
OT on the other hand is not connected to the internet to avoid any possibility of a hostile takeover of the facility infrastructure, hacking or other intrusions.
Thus simply connecting the OT network to the IT network will likely have unintended consequences, exposing the facility itself to the internet (via the IT) and introducing new security risks, which in turn risks the availability that the OT is committed to delivering to the IT.

Creating a secure meeting space

As in real life, when connecting both sides is difficult, it’s best to meet in a secure environment outside the existing environments. With ECO-Qube, we are introducing the idea of an Environmental Data Agent (EDA), a separate system that sits between IT and OT, isolating each of the networks from each other, while collecting the required information from each side to perform optimizations as well as pass-through signals & results from optimizations.

More than an information exchange

This is simply an information exchange, but is further expanded by enriching the data received from either side (e.g. adding CO2 emission factors to electricity data from the physical infrastructure or transforming computing resource consumption of the IT into electricity demands). It also enables the allocation of an individual server to a PDU or to a rack and therefore enables thermal optimizations and server-level control.

A dedicated controller system or on a traditional server

The EDA is designed as a standalone IT system running on Kubernetes. This makes it versatile in its deployment, it can run on dedicated Linux-based controller hardware (given it has at least two physical network interfaces) or it can run within the IT itself, creating multi-tenant capabilities, e.g. for co-location facilities in which many IT tenants rent space or compartments in the data center.

Any DCIM can become an EDA

Existing DCIM systems may take the role of an Environmental Data Agent by implementing the public API specifications for the EDA. The most important aspect, of enabling widespread adoption, is to make the EDA consistent across any infrastructure. The IT community often runs applications in a variety of heterogeneous data centers - e.g. spanning across a local server room, to co-location facilities all the way to the Cloud.
In all these environments, if an EDA is available, it should ‘look’ and ‘talk’ the same, removing the need for IT to adapt the applications & infrastructure for each environment and thus reducing the barrier to actually implementing the EDA.

Uniform abstraction of IT to OT and vice-versa

The EDA hence provides an abstract interface and representation of the IT to the facility & physical infrastructure (enabling communication & optimization that does not need to be tailored to different applications) while providing the IT an abstract representation of the physical infrastructure that can receive signals & makes optimization suggestions.

Enabling a market for IT & OT optimization

With its extension-friendly and open-source architecture as well as by bringing all the measurement data of physical infrastructure and IT together in one place, the EDA enables novel optimization strategies, such as those developed by the ECO-Qube project.
Two noteworthy examples from the project are:
  • Assignment optimization: Based on a zonal cooling system in the physical infrastructure, IT can concentrate & move containers or virtual machines onto different physical machines to optimize the overall cooling efficiency
  • Energy availability: Using signals from the electricity grid and the energy system, IT can shift applications in time (e.g. via queuing), turn off individual machines to consume less electricity, or leverage energy storage effectively.
  • Cost optimization: High-intensity processing can be delayed, e.g. when electricity prices are low (e.g. when renewable energy production is high) and overall IT can be optimized towards price signals from the physical infrastructure

A sustainability dashboard creates end-to-end reporting and insights on environmental & cost performance

Because the EDA combines all of the otherwise disparate information streams - IT utilization, IT resource consumption, energy system signals & carbon intensity, cooling performance, costs, etc. - it can provide a unique view of the overall end-to-end IT & OT performance, enabling the engineers on both sides to have a productive dialogue and develop novel optimization strategies jointly.
This dashboard can also help meet reporting requirements on environmental impact from the public, customers (e.g. OT to IT customers) as well as regulation. It can go beyond traditional efficiency metrics of PUE, WUE, etc., and report accurate numbers on the power usage of the IT and the efficiency of the physical infrastructure. Overall it supports the creation of transparency when it comes to optimization responsibility.

Finding a common language

IT communicates in resources: compute, memory, storage, etc. OT communicates in kWh, temperatures, and overall energy terminology. Ultimately, however, computation of any kind is the transformation of electricity into thermal energy, with computing resources (digital power) as the result.
Translating the IT resources into electricity consumption and thermal energy production is the first step toward a common language. The second is for the IT to provide a forward-looking load profile (> 15 min to 24 hours) which is divided into a maximum and minimum load. The minimum load represents the amount of non-flexible IT applications running (e.g. ‘baseload’ applications such as database systems), whereas the maximum load represents applications that can be shifted in time, or can be scaled up & down in terms of resource usage.
This load profile enables the OT (within the EDA) to simulate potential optimization strategies. From this the OT can communicate an ‘optimal’ load profile, taking into account the optimization objectives (e.g. energy efficiency, performance, grid friendly). The IT can then perform its own optimization and run the applications in a way that meets the agreed-upon load profile.
The common language thus is a load profile for resource usage expressed in electrical power consumption. Thermal energy can be considered in a later stage as a second data set to be used (given the IT is aware of its thermal energy generation). However, both conversions of computing resources to electricity and thermal energy are done by the EDA, to avoid the requirement of IT applications to be aware of their hardware configuration or other physical constraints that might have to be factored into the calculation. Further, the EDA can utilize sensor data (if available) as well as power-usage data from power distribution units (if available) within the OT to verify its own calculations with real-world evidence.

Conclusion

The EDA is the missing link between the physical infrastructure (OT) and information technology (IT). By receiving monitoring data from both sides it can enable the common language exchange between both sides by transforming IT resources into electrical energy, which can be understood by the OT. Negotiating an optimal load profile is the main interface between the two sides.
This system-level optimization with granular control all the way down to the individual server can enable the next level of optimization & efficiency in data centers, IT infrastructure, and applications.