The ongoing heatwave in the UK caused Oracle Cloud and Google Cloud to experience outages after cooling systems failed at both businesses’ data centers. A record-breaking heat wave that has been affecting the UK for the past week has resulted in oppressive temperatures there.
However, cooling systems at data centers used by Google and Oracle to host their cloud infrastructure have started to fail due to temperatures breaking records on July 19 with a record-breaking 40.2 degrees Celsius (104.4 degrees Fahrenheit). Both Google and Oracle have shut down hardware, causing interruptions in their cloud services to prevent irreparable damage to physical components and creating a lengthy outage. Oracle was the first to be impacted; the business reported a cooling failure at around 11:30 AM EST on July 19, resulting in the powering down of “non-critical hardware.”
“As a result of unseasonal temperatures in the region, a subset of cooling infrastructure within the UK South (London) Data Centre experienced an issue. This led to a subset of our service infrastructure needed to be powered down to prevent uncontrolled hardware failures,” reads an Oracle Cloud status message that seems to have been discovered by TheRegister first. “This step has been taken with the intention of limiting the potential for any long term impact to our customers.”
Oracle warns that clients in this area may not be able to access their Oracle Cloud Infrastructure resources, even with just non-critical hardware shut off. The Europe-west2-a zone for the region Europe-west2 was housed in one of Google’s buildings. Nearly two hours later, Google also reported cooling issues in that building.
According to the Google Cloud incident report, a cooling-related problem was observed in one of their buildings, which houses zone Europe-west2-a for region Europe-west2. This resulted in a partial breakdown of capacity in that zone, which prompted VM terminations and a loss of machines for a limited number of their customers. They are doing a lot of work to reactivate the cooling and increase capacity in that zone. They don’t expect any more effects in zone Europe-West2-A, and any currently running virtual machines shouldn’t be affected. Most replicated Persistent Disk devices run in a single redundant mode; however, a tiny minority do not.
“In order to prevent damage to machines and an extended outage, we have powered down part of the zone and are limiting GCE preemptible launches. We are working to restore redundancy for any remaining impacted replicated Persistent Disk devices.”
Customers of Google Cloud are experiencing disruptions similar to Oracle due to this cooling issue, including terminating virtual machines, inaccessible computers, and persistent disk devices operating in single redundancy mode. As they seek to restore cooling systems to service, both businesses state that they do not anticipate any more effects.