Zurück | Archiv

Rechner-Cluster - c23i Partition is DOWN for the HPC JupyterHub

Donnerstag 18.07.2024 15:15 - unbekannt

The c23i Partition is DOWN due to unforeseen consequences of our Monitoring systems that automatically downs the only node in the partition. A solution is momentarily unknown and will be investigated. The HPC JupyterHub will not be able to use it until it is resolved.

Do 18.07.2024 15:29

Rechner-Cluster - MPI jobs may crash

Dienstag 16.07.2024 16:12 - unbekannt

Since the cluster maintenance, random MPI job crashes are observed. We are currently investigating the issue and are working on a solution.

Mo 22.07.2024 09:37

Updates

We have identified the issue and are currently testing workarounds with the affected users.

Mi 24.07.2024 12:41

Rechner-Cluster - Old HPCJupyterHub GPU profiles might run slower on the new c23g nodes.

Freitag 24.05.2024 11:00 - unbekannt

Please migrate your notebooks to work with newer c23 GPU Profiles! -- The migration of the GPU Profiles to Claix 2023 and the new nodes of c23g has made the old python packages use non optimal settings on the new GPUs. Redeployment of these old profiles is necessary and will take some time.

Fr 24.05.2024 11:15

Rechner-Cluster - Temporary Deactivation of User Namespaces

Montag 08.07.2024 14:15 - Donnerstag 18.07.2024 13:00

Due to a security vulnerability in the Linux Kernel, user namespaces are temporarily deactivated. Upon the kernel update, user namespaces can be used again.

Mo 08.07.2024 14:32

Updates

User namespaces are available again.

Do 18.07.2024 13:00

Rechner-Cluster - Quotas on HPCWORK may not work correctly

Donnerstag 27.06.2024 14:30 - Donnerstag 18.07.2024 12:30

The quota system on HPCWORK may not work correctly. There may be an error "Disk quota exceeded" if trying to create files although the r_quota command reports that enough quota should be available. The supplier of the filesystem has been informed and is working on a solution.

Do 27.06.2024 14:40

Updates

File quotas for all hpcwork directories were increased to one million.

Do 18.07.2024 12:39

Rechner-Cluster - Reconfiguration of File Systems and Kernel Update

Montag 15.07.2024 07:00 - Dienstag 16.07.2024 16:11

During the Maintenance, $HPCWORK will be reconfigured, such that RDMA over IB will be possible from the CLAIX23 nodes instead of HPCWORK access over ethernet. At the same time, the Kernel will be updated. After the Kernel Update, the previously deactivated User Namespaces will be re-activated, again.

Mi 10.07.2024 09:43

Updates

The maintenance had to be extended for final filesystem tasks

Mo 15.07.2024 15:24

Due to unforseen problems, the maintenance has to be extended to tomorrow 16.07.2024 18.00. We do not expect the manufacturer of the filesystem to take that long, but expect to open the cluster earlier again.

Mo 15.07.2024 17:24

The maintenance could be ended successfully. Once again, sorry for the long delay.

Di 16.07.2024 16:12