Default Fram script is buggy
#!/bin/bash
#SBATCH --job-name Test ## Name of the job
#SBATCH --output slurm-%j.out ## Name of the output-script (%j will be replaced with job number)
#SBATCH --account nn9999k ## The billed account
#SBATCH --time=00:01:00 ## Walltime of the job
#SBATCH --mem-per-cpu=2048 ## Memory allocated to each task
#SBATCH --ntasks=1 ## Number of tasks that will be allocated
set -o errexit ## Exit the script on any error
set -o nounset ## Treat any unset variables as an error
<Your command here> ## Command to be run
These settings return:
[mbjorgve@login-3.FRAM ~]$ sbatch job-script.sh
sbatch: error: Memory specification only allowed for bigmem jobs
sbatch: error: Batch job submission failed: Memory required by task is not available
Bhm's comments
Here are a couple of things I believe should be fixed/changed in the generator. I realise that the list is quite long, so I will start by saying that I do think the generator is a good idea! :)
General changes
-
Use the term "job type" or "type of job" instead of "partition" -
I suggest making "Modules to load" be checked by default -
I suggest changing the pre-filled-out time limit to 00:30:00 (we try to discourage very short jobs). -
Remove the "Allow sharing of this node" option. There should be no need for --exclusive
in job scripts on our clusters, and in some cases it creates problems for the job. I know that for instance Espen Tangen claims that it is needed for some jobs, but I am quite sure it is not. In any case, we should discourage its usage. -
The "Job is requeable" is a no-op with our setup, because --requeue
is (in effect) the default. The user can turn this off by adding--no-requeue
. -
An enhancement could be to let the field "Command to run" be a multi-line field, and call it "Commands to run". -
The "Memory per task" option should be called "Memory per cpu" -
"Number of GPUs per node per task" should simply be called "Number of GPUs", and should default to 1. We want to avoid accel jobs that don't ask for GPUs (it might even be disallowed in the future).
Betzy
-
Only the job types "accel" and "preproc" allow specifying memory (and they demand it, so if possible the memory option should be preselected for these job types). -
"Maximum nodes" should be
Fram
-
Job type "course" was just a temporary solution, and should not be listed here (I've just forgotten to remove it from the Slurm config). -
Only the job type "bigmem" allow specifying memory (and it demands it, so if possible it should be preselected for these job types). -
The "Cores per node" should be 32 also for "devel" and "short" job types.
Saga
-
The "Maximum nodes" field should be named "Max billing units". (The values in the field are correct, though.) -
For "normal" jobs, the "Cores per node" is either 40 or 52 (we have two types of nodes), so perhaps "40 / 52"? -
Similarly, for "bigmem" jobs, the "Cores per node" is either 40 or 64. -
The "devel" and "optimist" job types can be combined with "normal", "bigmem" or "accel", and will then share their limits wrt timelimit, number of billing units and cores on nodes. Not sure how to solve this in practice... -
"Memory per cpu" should be preselected.