Maintenance
Upcoming maintenance on the VACC cluster¶
January 7-8, 2026¶
The cluster will be down for scheduled maintenance to upgrade the operating system. We will move from:
- RHEL 9.4 to RHEL 9.6, including many bugfixes.
- Slurm from 25.05 to 25.11
Jobs scheduled to run past 07:00AM on Wednesday, January 7th will not be permitted to start, and will wait until after the maintenance is complete. We plan to have the cluster back online and handling jobs by 5pm on the 8th.
GPFS3 rebuild¶
All files on /gpfs3 will be deleted on January 7th so that we can rebuild the file system. A new policy of automatically deleting files that have not been accessed within 60 days will be implemented. To emphasize the new policy, /gpfs3 will be renamed /gpfs3tmp.
Details about /gpfs3tmp¶
To improve service to VACC users, we will rebuild the /gpfs3 filesystem on Jan 7, 2026. This filesystem was originally intended to be only for temporary files. After the rebuild, it will be renamed /gpfs3tmp, and automatic purging of files that are not being accessed will be implemented. Directories on it will only be created for each PI group. There are two main changes to be aware of:
- Files untouched for sixty (60) days will be automatically deleted. Since this is scratch (temporary storage), there is no backup. A warning email will be sent at the forty (40) day mark. No notifications will be sent about deletions on day sixty.
- No per-user directories are automatically created. Group members will be able to create subdirectories under their group's PI directory.
Regarding the currently existing gpfs3: a snapshot will be taken of the filesystem before it is deleted and rebuilt. However, our backup of the existing gpfs3 (which we do not normally perform) will only be held for 60 days.
This new filesystem will be in place by January 8th at 5PM.
January 2026 (Specific dates TBA)¶
Two major projects at the datacenter will require planned downtime.
The pair of UPSes at the Tech Park Data Center need to be replaced. Many compute nodes are only covered by a single UPS, so must be powered down during electrical work. We expect there to be 2 (two) 4 (four) hour outages for each UPS. This will require at the temporary shutdown of about half of the VACC compute nodes, and possibly the entire cluster for each window.
We expect the first UPS maintenance window to be sometime in the January 19-31 window, and the second UPS maintenance window to be sometime in the February 1-13 window.
We are working to minimize and finalize these maintenance windows.
March 2026 (specific dates TBA)¶
We will be installing new cooling and power in the data center, in order to handle all of the new IceCore nodes. This will require shutdown of the secondary cooling loops, which may require the shutdown of IceCore and DeepGreen nodes.
After the installation of IceCore, the DeepGreen hardware ( containing the NVIDIA V100 GPUs ) will be retired.