RWTH High Performance Computing (HPC)

Mehr Informationen zu dem Service finden Sie in unserem Dokumentationsportal.

[CLAIX-2025] Maintenance to resolve fabric issues

Teilwartung
Fr, 29.05.2026 18:00 - Unbekannt

CLAIX-2025 is under temporary maintenance to fix remaining issues in the fabric. Prospectively, the test operation can be resumed thereafter.

02.06.2026 11:23
Updates

The fabrice issuese could be resolved last week. However, we can still encounter issues that need to be resolved at short hand and can lead to a temporary unavailability.

08.06.2026 11:45

Due to new issues, the nodes need to be powercycled. All running jobs will fail and need to be re-submitted.

09.06.2026 18:13

All nodes were powercycled. The manufacturer is working on debugging the issues.

09.06.2026 18:33

The fabric once again got caught in an unclean state. The fabric must be rebooted again to solve the issues. All running jobs will be affected.

11.06.2026 17:34

CIFS disabled

Hinweis
Do, 28.05.2026 09:58 - Mi, 01.07.2026 09:58

Due to an uresolved security issue, we have disabled CIFS capabilities temporarily on all systems.

28.05.2026 09:58

Kürzlich abgelaufene Meldungen

Keine Einrichtung neuer HPC-Accounts

Teilstörung
Mo, 15.06.2026 13:00 - Di, 16.06.2026 11:17

Aktuell werden HPC-Accounts, die in der Regapp neu angelegt werden, aufgrund einer Störung nicht auf dem System eingerichtet.

15.06.2026 13:09
Updates

Teilstörung wurde behoben.

16.06.2026 11:18

Nodes unavailable due to filesystem issues.

Teilstörung
Mi, 10.06.2026 12:00 - Mi, 10.06.2026 17:30

We are currently experiencing global filesystem issues on some Claix 2023 nodes.
These nodes have been drained and the no new jobs can start on them.
Waiting times of pending jobs will therefore increase.
Running jobs on these nodes might also experience crashes due to failing FS.

We are working on a solution.

10.06.2026 15:39
Updates

Die Dateisystemstörung wurde behoben. Die betroffenen Knoten sind wieder im Normalbetrieb.

10.06.2026 17:56

[CLAIX-2025] Downtime due to CDU maintenance work

Teilwartung
Fr, 12.06.2026 11:25 - Fr, 12.06.2026 13:45

The cluster and all switches must be powered off for security reasons during the CDU maintenance work since it can be assumed that the maintenance will impact the cooling capability.

12.06.2026 11:24