Active Incident

Updated a few seconds ago

Incident Status

Degraded Performance

Components

Network

Services

Firewall Services



May 3, 2026 11:35PM EDT
IDENTIFIED

The firewall high availability (HA) service is currently operating in a degraded state due to intermittent network interface instability, which has resulted in periodic failover events. As a result, the CODA2 Palo Alto (PA) firewall remains in a degraded status. The environment continues to function, and we are actively monitoring stability.

Incident Status

Operational

Components

Other

Services

Other



May 1, 2026 7:48PM EDT
MONITORING

The Office of Information Technology is responding to a recently disclosed software vulnerability, Copyfail (CVE-2026-31431), that may affect certain server environments. As part of our response, affected servers have been successfully migrated or updated using automated remediation methods. These actions reduce exposure and help ensure the continued security and reliability of Georgia Tech services. The Security Operations Center (SOC) continues to actively monitor for indicators related to this vulnerability and is working directly with system owners whose servers could not be remediated through automated methods to ensure appropriate updates or mitigations are applied. At this time there is no indication of active exploitation affecting Georgia Tech systems. We will continue to monitor the situation and provide updates as appropriate. Thank you for your cooperation and continued attention to system security.

Academic Services

Operational

Access Control

Operational

Audio Visual (AV) Technology

Operational

Buzzcard

Operational

Campus Services

Operational

Chat and Remote Meetings

Operational

Cloud and File Storage

Operational

Data and Reporting

Operational

Data Center

Operational

Email and Calendaring

Operational

Endpoint Infrastructure

Operational

Generative AI

Operational

GTPE

Operational

Identity

Operational

Infrastructure Technologies

Operational

Network

Degraded Performance

Printing and Copying Services

Operational

Servicenow

Operational

Student Information Systems

Operational

Other

Operational

Student Services

Operational

External Services

Scheduled Maintenance

Schedule

May 12, 2026 6:00AM - May 15, 2026 6:00PM EDT

Components

Academic Services

Services

PACE

Description

WHEN IS IT HAPPENING? PACE's next Maintenance Period starts at 6:00AM on Monday, 1/12/2026, and is tentatively scheduled to conclude by 11:59PM on Thursday, 1/15/2026. PACE will release each cluster (Phoenix, Firebird, and ICE) as soon as maintenance work is complete. WHAT DO YOU NEED TO DO? As usual, jobs with resource requests that would be running during the Maintenance Period will be held until after the maintenance by the scheduler. During this Maintenance Period, access to all the PACE-managed computational and storage resources will be unavailable. This includes Phoenix, Firebird, and ICE. Please plan accordingly for the projected downtime. WHAT IS HAPPENING? Apply updates to all production operating systems, During the scheduled maintenance window, multiple infrastructure upgrades and operational changes were successfully completed across PACE systems. These activities focused on improving stability, security, performance, and capacity across compute, storage, networking, and user‑facing services. Scheduler and Platform Services The Slurm workload manager was upgraded to the latest tested release across production clusters, including associated web‑based and cloud‑integrated components. This upgrade incorporates upstream fixes and improvements validated in development prior to deployment. Open OnDemand (OOD) was migrated to the current major release on both Phoenix and ICE platforms, aligning with supported versions and improving long‑term maintainability. Compute and Login Infrastructure New hardware was deployed for login nodes to expand capacity and improve reliability. A dedicated H100 GPU node within the AI Makerspace environment was reassigned to support SPIN workloads. A Phoenix login node based on new GNR hardware was promoted into production service. Firebird head nodes were relocated to new underlying GNR hardware to improve platform consistency. Storage and Data Services The DDN NVX scratch storage system was physically relocated within the data center to support infrastructure reorganization. Exascaler and SFA software components were upgraded to their latest supported versions, improving stability and incorporating vendor fixes and enhancements. Networking and Fabric Phoenix fabric subnet managers were replaced to improve reliability and operational consistency. InfiniBand switches were upgraded to a newer firmware release to address fixes and maintain compatibility with updated components. GPU and System Firmware Firmware updates were applied to DGX A100 systems to address known BMC and SBIOS security vulnerabilities. DGX H100 system firmware was updated to align with current vendor recommendations.. WHY IS IT HAPPENING? Regular maintenance periods are necessary to reduce unplanned downtime and maintain a secure and stable system. WHO IS AFFECTED? All users across all PACE clusters. WHO SHOULD YOU CONTACT FOR QUESTIONS? Please contact PACE at pace-support@oit.gatech.edu <pace-support@oit.gatech.edu> with questions or concerns. You may read this message on our blog. Thank you, -The PACE Team
Welcome to Georgia Tech's IT Service Status Page Don't see your issue posted here? Let us know! Administrative Services Center Email: support@oit.gatech.edu Phone: 404-385-1111 Location: Atlanta Campus, Clough Building Room 215