The parallel file system (GPFS) providing the storage for $DATA and $WORK is again nearly full (93% full or 63TB free space). In order to resolve this issue we have scheduled a downtime of the cluster so that we can implement some changes on the storage systems. Here are the details:
- Downtime: On Wednesday, April 24th, from 10-12am the cluster will be completely unavailable, that means the login nodes will be inaccessible and no jobs will be running on the cluster. A reservation is in place to ensure that no jobs are started that would run into the maintenance (you will notice jobs with long time limits pending as we get closer to the downtime).
- Storage Changes: After the downtime the $DATA directory will be no longer located on the cluster GPFS but on the central storage system of the IT services. Data will be migrated in the background and will remain accessible all the time (except during the downtime, of course) and in the same way as before. During the downtime, the SMB share will not be available and the address of the share may change afterwards (details will follow).
The most important impact of the change will be that the I/O performance of the $DATA directory will be lower compared to $WORK. Therefore, the general recommendation is to use $WORK for I/O at job runtime and to keep the data on $WORK as long as it is processed on the cluster. Final data files can be moved to $DATA. Please keep in mind that storage space is limited and that stored data should be reduced to a minimum (by keeping only the most important files in a compressed archive format).
Further imformation will follow closer to the scheduled downtime.