RWTH High Performance Computing (HPC)
Mehr Informationen zu dem Service finden Sie in unserem Dokumentationsportal.
[CLAIX-2025] Maintenance to resolve fabric issues
CLAIX-2025 is under temporary maintenance to fix remaining issues in the fabric. Prospectively, the test operation can be resumed thereafter.
The fabrice issuese could be resolved last week. However, we can still encounter issues that need to be resolved at short hand and can lead to a temporary unavailability.
Due to new issues, the nodes need to be powercycled. All running jobs will fail and need to be re-submitted.
All nodes were powercycled. The manufacturer is working on debugging the issues.
The fabric once again got caught in an unclean state. The fabric must be rebooted again to solve the issues. All running jobs will be affected.
CIFS disabled
Due to an uresolved security issue, we have disabled CIFS capabilities temporarily on all systems.
Kürzlich abgelaufene Meldungen
Keine Einrichtung neuer HPC-Accounts
Aktuell werden HPC-Accounts, die in der Regapp neu angelegt werden, aufgrund einer Störung nicht auf dem System eingerichtet.
Teilstörung wurde behoben.
Nodes unavailable due to filesystem issues.
We are currently experiencing global filesystem issues on some Claix 2023 nodes.
These nodes have been drained and the no new jobs can start on them.
Waiting times of pending jobs will therefore increase.
Running jobs on these nodes might also experience crashes due to failing FS.
We are working on a solution.
Die Dateisystemstörung wurde behoben. Die betroffenen Knoten sind wieder im Normalbetrieb.
[CLAIX-2025] Downtime due to CDU maintenance work
The cluster and all switches must be powered off for security reasons during the CDU maintenance work since it can be assumed that the maintenance will impact the cooling capability.