Active Incident

Updated a few seconds ago

Data center cooling issuesService Disruption

Incident Status

Service Disruption

Components

Academic Services

Services

PACE



April 3, 2025 10:26PM EDT
[Monitoring] The controller for the system providing cooling to nodes in the Coda Research Hall has been restored and we have returned to the HTCP lineup and are in normal operation.

April 3, 2025 11:07AM EDT
[Identified] Some compute nodes on ICE were accidentally powered off last night, which may have impacted some running jobs. We have restored a partial selection of those nodes to service so that all hardware types are available. There was a brief pause in the scheduler this morning from 9:17am to 9:41am, which may have prevented jobs from starting during that time. Most ICE compute nodes are currently available for course usage.

April 3, 2025 9:56AM EDT
[Identified] Our vendors are working to restore cooling capabilities to the datacenter by fully replacing the cooling system controller and expect to have the work completed by 7:00pm ET. We hope to return all systems to service by tomorrow (Friday) evening, provided that all repairs to the cooling system are complete and after testing for stability after the shutdown. Clusters will be released as testing is completed for each system.

April 2, 2025 9:47PM EDT
[Identified] It has been determined that our water pump controller will need to be replaced, and we are currently coordinating with the support vendor on this replacement process.

April 2, 2025 6:25PM EDT
[Identified] Water pump controller failed, affecting the cooling of the research hall. Support vendor has been engaged and is assessing the situation.

April 2, 2025 5:51PM EDT
[Investigating] Due to continued high temperatures, all Phoenix compute nodes have been turned off, and all running jobs were cancelled. Impacted jobs will be refunded at the end of April.

April 2, 2025 5:25PM EDT
[Investigating] The controller for the system providing cooling to nodes in the Coda Research Hall has failed. To avoid damage, PACE has urgently shut down many compute nodes to reduce heat.

April 2, 2025 5:12PM EDT
[Investigating] All Hive nodes are powered off. All jobs failed. All Buzzard nodes are powered off. All jobs failed (though presumably requeued). All new jobs on Phoenix are held. All idle nodes on Phoenix are being turned off. All Firebird nodes are powered off. All jobs failed.

April 2, 2025 5:08PM EDT
[Investigating] A cooling controller failed at the data center. Shutting down PACE clusters.

Incident Status

Operational

Components

Network

Services

Firewall, VPN



March 31, 2025 10:57AM EDT
[Monitoring] The cause has been identified and a fix has been implemented as of 10:55am. Users should now be able to access all resources protected by user-based policies as normal.

March 31, 2025 10:46AM EDT
[Investigating] OIT Network Engineering is aware of issues affecting a subset of GlobalProtect VPN users who are unable to access resources protected by user/group-based firewall policy (also known as User-ID). As a result, many users are not able to access various systems and applications. The cause of the issue is being actively investigated at this time. Further updates will be provided as progress is made.

Academic Services

Degraded Performance

Campus Services

Operational

Campus Services - ITG

Operational

Campus Audio Visual Services

Operational

Email and Calendaring

Operational

Identity

Operational

Network

Operational

Web Hosting

Operational

Enterprise Data Services

Operational

IT Service Management

Operational

VideoConferencing

Operational

GTPE

Operational

Cloud Storage

Operational

Other

Operational

External Services

Welcome to Georgia Tech's IT Service Status Page Don't see your issue posted here? Let us know! Administrative Services Center Email: support@oit.gatech.edu Phone: 404-385-1111 Location: Atlanta Campus, Clough Building Room 215