RWTH High Performance Computing (HPC)

You can find more information about the service in our documentation portal.

[CLAIX-2025] Issues with respect to reachability

Partial Outage
Tue 05/12/2026 08:00 AM - Unknown

At the moment, the CLAIX-2025 fabric suffers from reachability issues. We are currently working on analyzing and identifying the root cause of the issues.

12.05.2026 11:01
Updates

Our hardware vendor has identified an issue that needs further analysis.

12.05.2026 17:10

[CLAIX-2025] Large CDU active - GPU restrictions will be removed soon

Notice
Tue 05/12/2026 03:00 PM - Mon 06/01/2026 12:00 AM

Due to limited cooling capacity, 12 GPU nodes were disabled from the batch operation. With having the large CDU installed and activated today, these limitations can be removed soon.

12.05.2026 17:15

Recently expired reports

RegApp maintenance

Partial Maintenance
Mon 05/11/2026 10:00 AM - Mon 05/11/2026 10:10 AM

The RegApp will be briefly unavailable due to system changes.

Login to the HPC and connected services will be unavailable.

06.05.2026 12:46

[CLAIX-2025] Gateway Reconfiguration

Partial Maintenance
Thu 05/07/2026 12:00 PM - Fri 05/08/2026 08:00 AM

To Increase the bandwidth and redundancy, the gateway configuration to the CLAIX-2025 HPC fabric must be re-configured.
Since the IP addresses will change, the reachability of all fabric-only nodes will be interrupted. This implies that the batch operation will be interrupted as well.
The dialog nodes are still reachable via Ethernet.

07.05.2026 11:48
Updates

We are updating the configuration now. Please be warned that a disruption will occcur at short-hand.

07.05.2026 14:17

The reconfiguration was fixed over night, and the the operation immediately resumed.

08.05.2026 10:56

[CLAIX-2025] Installation of the Large CDU

Partial Maintenance
Thu 05/07/2026 09:00 AM - Fri 05/08/2026 02:37 PM

The large CDU will be installed to replace the temporarily installed small CDUs. During the installation the load on the cluster must be reduced. Disruptions from the test operation cannot be excluded.

07.05.2026 16:22
Updates

The large CDU cannot be connected at the moment. The maintenance has to be postponed.

08.05.2026 14:37

[CLAIX-2025] Fabric Reboot

Partial Maintenance
Wed 05/06/2026 11:50 AM - Wed 05/06/2026 04:00 PM

Attention
The test operation of CLAIX-2025 must be temporarily suspended.

ALL NODES WILL BE SET TO DOWN DUE TO A REQUIRED FABRIC REBOOT

Due to a switch malfunction, the respective switch must be removed from the CLAIX-2025 fabric and the fabric rebooted to restore a stable operation. A replacement is not possible at the moment. Henceforth, after having removed the switch, following nodes cannot be used in the batch system until further notice:

i25s[0011-0022],n25l[0001-0040],n25t[0001-0004]

06.05.2026 11:49
Updates

The cluster is available again for test operation. The aforementioned nodes remain unavailable until further notice.

06.05.2026 16:09

[CLAIX-2025] Switch replacement

Partial Maintenance
Tue 05/05/2026 08:45 AM - Tue 05/05/2026 04:45 PM

One replacement switch was faulty and must be replaced once again. At least n25s0001..0064 should be affected and cannot be used during the maintenance.

05.05.2026 08:48
Updates

The maintenance has to be extended to all batch nodes since all switches need to be rebooted and a diagnosis after the replacement. The test operation of CLAIX-2025 will be suspended until the end of the maintenance.

05.05.2026 09:15

The switch replacement is finished, and the fabric seems to be stable.

05.05.2026 16:45

[Jupyterhub] Reboot required

Notice
Fri 05/08/2026 08:45 AM - Fri 05/08/2026 09:00 AM

Due to mitigating a security issue, the Jupyterhub will be rebooted at 08:45 CEST.

08.05.2026 08:37