Operational forensics is the practice of using building automation system data and data analytics to determine the sequence of events leading up to an incident and to determine root cause.
The Executive Overview
Imagine walking into your facility on a Monday morning to find water covering the floor. Water damage is everywhere – mechanical rooms, laboratories and office spaces. The laboratory experiments and the complex machinery used to run the experiments are damaged. The office computers, paperwork, chairs and carpets are wet. As an owner, operator, user or facility manager this is your nightmare scenario. The costs and the liabilities from such an incident can be substantial. But who is to blame for such a crisis? Was anyone at fault or was it simply an unfortunate equipment failure? Your first words would likely be “Get me the Facilities Manager!”
Now imagine that all of the facilities management functions are actually performed by a contractor. Does your interpretation of ‘liability’ change?
Unfortunately, such an event actually happened. Luckily for the building owner, Cimetrics was already providing Analytika, an ongoing commissioning, fault detection and diagnostics service, to the owner. Analytika enabled Cimetrics to investigate the crime scene, dubbed “The Coils Incident”, to determine precisely what happened and who should be held accountable. Utilizing the Analytika system, Cimetrics was able to conduct ‘Operational Forensics’ to determine the sequential events that led up to The Coils Incident and to determine the root cause of the failure. Ultimately, the outsourced contractor admitted negligence and was held responsible for damages of more than $60,000.
Operational forensics is the practice of using building automation system data and data analytics to determine the sequence of events leading up to an incident and to determine root cause. It is a systematic approach to putting together pieces of an operational puzzle using temperatures, pressures, equipment signals and alarms, to better understand what was done (or not done) to bring about unintended consequences.
The Technical Details
The facility, located on the east coast of the US, is a mixed-use space, which is comprised of offices and laboratories. The office air handling units (AHUs) are mixed air units with modulating outside air dampers that are capable of economizer operation when outdoor air conditions are favorable. The laboratory AHUs are 100% outside air (or ‘single pass’) units. All of the AHUs and major equipment are controlled by a Building Automation System (BAS) and the facility has at least one engineer on duty 24 hours a day seven days a week.
Two of the four laboratory AHUs’ coils ruptured due to freezing of the chilled water flowing through them. Since these AHUs are large capacity (greater than 35,000 CFM supply air and 500 GPM chilled water coils), and also 100% outside air units, they are highly susceptible to freezing during extreme cold outdoor air conditions, if proper precautions are not taken. When the air passing over or around chilled water (CHW) coils drops below freezing (32°F) and there is little or no water flowing through the coils, the water freezes and expands which causes the copper coils to burst. If the control valves that modulate the CHW flow are located on the exit side of the coils then chilled water will flow out of the burst coils once the ambient air temperature around the coils rises above freezing. This scenario occurred in this facility; chilled water flooded the mechanical room and leaked through the floor to the occupied spaces.
Cimetrics collects and stores data 24/7/365 for all HVAC equipment at the site. Hence, we were able to use operational forensics after the fact to identify the root cause of the damaged coils. Not only was the building owner able to hold the outsourced facility management contractor accountable, but they were able to put measures in place to ensure that similar disruptions are not repeated.
A detailed operational forensic analysis performed by Cimetrics determined the chain of events that led up to the failure. The analysis revealed that the outsourced facility management engineering staff could have easily prevented the coils from freezing, and that the engineering staff ignored several warning signs.
The BAS workstation has low temperature or ‘freeze stat’ alarms programmed into the sequences of operation. Cimetrics discovered that the engineering staff ignored the freeze stat alarms and took no action to investigate the cause of the alarms. It is important to note that Cimetrics had warned the client of the upcoming cold spell and that some of the AHU coils were susceptible to freezing if the alarms were ignored. These warnings were also disregarded by the facility operating contractor.
Cimetrics’ operational forensics also determined that the AHU freeze protection control logic was inadequate. Typically, for 100% outdoor air AHUs with chilled water cooling coils, the BAS control logic will command the AHU supply fan to shut down, the outdoor air dampers to shut and the preheat and CHW valves to be signaled full open when the air temperature entering the CHW coils approaches freezing (32°F). In addition to this software programming, hardware override switches automatically shut down the AHU when the inlet air temperature to the CHW coils approaches freezing.
Because the laboratory experiments are conducted around the clock, the laboratory AHUs are in operation 24 hours a day seven days a week. The figure above shows a 24-hour period with outside air temperature (blue line) on the left axis and the number of alarms (bars) for each of the (4) laboratory AHUs on the right axis. The outside air temperature was below freezing (32°F) for the majority of the day and dropped rather rapidly during the evening hours. As shown in the figure, three of the four AHUs went into alarm starting around 6:30 PM and continued in alarm mode through midnight. The fact that the units stayed in alarm mode for extended periods of time shows that these alarms were not addressed. Cimetrics confirmed this when the client visited the Engineering Office after The Coils Incident and noticed a note on the wall saying “Do not ignore alarms and do not operate the AHUs in operator mode”. As it turns out, the site personnel had a history of ignoring BAS alarms and running equipment in override.
The operational forensics revealed that neither the preheat valves or the CHW valves were signaled full open when the preheat temperature dropped to critically low levels after the unit shut down. The following figure shows that the preheat discharge air temperature dropped below freezing (32°F) for extended periods of time.
The figure above shows data for AHU-2 which was one of the units whose chilled water coils were frozen. The AHU preheat discharge air temperature low limit setpoint is 41°F (red). During times when the outside air temperature falls below 41°F, the preheat valve should modulate open to maintain the preheat air temperature to at least the low-limit setpoint. The analysis of AHU-2 data revealed that the preheat temperature dropped below the 41°F low-limit at around 7:30PM and the unit shut down on freeze alarm at around 7:00PM. The preheat valve signal remained at 25% OPEN even as the preheat temperature dropped below 41°F.
During this entire time period the CHW valve was signaled full CLOSED (or 0%). Opening the chilled water valve fully during extremely low preheat air temperature conditions (when the unit goes into freeze ALARM) may have prevented the chilled water coils from freezing. The pressure in the chilled water loop would have forced enough chilled water through the coils to delay, if not prevent, the water in the CHW coils from freezing.
Following the incident Cimetrics worked with the building engineers to help prevent the situation from happening again. In particular, several policy measures and BAS programming changes were enacted including:
- The building engineers ensure that no equipment is left running in operator (override) mode for extended periods of time.
- All BAS alarms are to be addressed proactively and an alarm log is kept up to date.
Changes were made to the freeze protection control logic which included:
- Unit supply fan will be commanded OFF and the outdoor air dampers commanded full CLOSED when unit goes into freeze protection alarm.
- The preheat valve will be signaled full OPEN when the preheat air temperature drops below the freeze protection low-limit setpoint.
- The CHW valve and associated CHW loop pump(s) will be signaled full OPEN and ON when the preheat air temperature drops below the freeze protection low-limit setpoint.
While ongoing commissioning has significant energy and cost savings benefits, this episode shows that the benefits do not stop at cost savings. Unfortunately, sooner or later, a scenario like this is likely to happen at your facility. When it happens how prepared will you be to determine the root cause? Will you be prepared to address it quickly or, better yet, prevent it from happening all together? Rest assured, monitoring based ongoing commissioning can provide the operational forensics and preventative capability you need, and it is always there when you need it.
A Happy Ending
A serious system failure can be a catalyst for improved operations. Following the incident described in this article, the building owner, the facility management contractor, and the ongoing commissioning services provider (Cimetrics) established a more effective working relationship. This led to improved operational procedures and prioritization of maintenance activities, resulting in a lower risk of serious system failures and additional energy savings.
Managing the Outsourced Facility Management
The ability to perform forensic investigations as described above is especially important in buildings with outsourced facility operations. In fact, the outsourcing of facility and real estate management is increasingly common across some industries, as shown in the following figure.
Source: KPMG 2013 Global Real Estate & Facility Management (REFM) Outsourcing Pulse Survey
While outsourcing of facility management functions may seem like an attractive option, it is important for facility owners and managers to consider the following:
- How to keep a pulse on facility operations and staff
- How to structure the supplier contract, given the possibility of a failure such as the one described in this article
- What is the impact of downtime (production disruptions, productivity loss, etc.), and what role/impact does the outsourced facility management provider have in preventing downtime
- The value of having an outside, unbiased third-party monitor facility and equipment operations and be the point of contact with facility owners and operators