Zurück | Archiv

Rechner-Cluster - Full global maintenance of the HPC CLAIX Systems

Dienstag 30.09.2025 15:35 - Freitag 03.10.2025 19:25

Our GPFS global filesystem needs to be updated and will cause the entire CLAIX HPC System to be unavailable. Please note the following: - User access to the HPC system through login nodes, HPC JupyterHub or any other connections will not be possible during the maintenance. - No Slurm jobs, filesystems dependent tasks will be able to run during the maintenance. - Before the maintenance, Slurm will only start jobs that guarantee to be finished before the start of maintenance; any running jobs must finish by then or might be terminated. - Nodes might therefore remain empty leading to the maintenance, as Slurm tries to clear the nodes from user jobs. - Waiting times before and after the maintenance might be higher than usual, as nodes are emptied before or the queue of waiting jobs increases in size afterwards. - Files on your personal or project directories will not be available during the maintenance.

Di 16.09.2025 15:31

Updates

Unfortunately the maintenance works will have to be extended. We hope to be done as soon as possible. We apologize for the inconvenience.

Di 30.09.2025 15:08

We must unfortunately postpone the release of the HPC system for normal use until Wednesday. We apologise for the delays.

Di 30.09.2025 20:12

Within the maintenance, a pending system upgrade due to security issues, a system update is done as well. However, due to the large number of nodes, the update still requires some time. The cluster will be available as soon as possible. Unfortunately, we cannot give an exact estimate when the updates are finished.

Mi 01.10.2025 17:09

All updates should be completed later this evening. We target the cluster to be available tomorrow by 10:00 a.m.: The Frontend nodes should be available earlier prior to the batch service that will prospectively be resumed by 11:00 a.m. We apologize once again for the unforseen inconveniences.

Mi 01.10.2025 18:18

The updates are still not completed and require additional time. We estimate to be finished this afternoon. The Frontends are already available again.

Do 02.10.2025 10:25

The global maintenance tasks could be completed, and we are starting to put the cluster back to operation starting from now. However, several nodes will temporarily remain under maintenance due to issues that could not be solved yet.

Do 02.10.2025 15:39

Operation of most of the nodes could be restored. The remaining few nodes will be processed soon.

Fr 03.10.2025 19:26

Rechner-Cluster - Migration to Rocky Linux 9

Mittwoch 01.10.2025 08:00 - Mittwoch 01.10.2025 15:30

The CLAIX-2023 copy nodes copy23-1 and copy23-2 will be reinstalled with Rocky Linux 9. During the Reinstallation, the nodes will not be available.

Di 30.09.2025 17:45

Rechner-Cluster - NFS Störung der GPFS Server

Mittwoch 17.09.2025 21:35 - Donnerstag 18.09.2025 10:06

Aktuell drainen alle Knoten aufgrund einer Störung. Wir arbeiten mit dem Hersteller daran.

Do 18.09.2025 09:16

Updates

Das Problem konnte gelöst werden, der Cluster ist wieder in Operation.

Do 18.09.2025 10:06