File system issues on COSMOS-SENS
– Published 18 July 2024
We are currently experiencing file system issues on COSMOS-SENS. Occasionally compute nodes drop the connection to the file system and the job will terminate without writing any further output. Loosing connection appears to happen at random. Some jobs succeed while others fail. The COSMOS system is not affected by the issue.
For urgent work the job queues on COSMOS-SENS are still open. Failing jobs should be re-submitted. To increase the chances of success, it is recommended to limit jobs to a single compute node - which is up to 48 cores. A larger number of short jobs seem preferential to fewer long running jobs.
We are currently working on the issue and are trying to understand matters further.
We would like to apologise for the issue and the inconvenience caused.
The LUNARC team